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Preface 


This textbook is of an interdisciplinary nature and is designed for a two- or one-semester course in 
probability and statistics, with basic calculus as a prerequisite. The book is primarily written to give 
a sound theoretical introduction to statistics while emphasizing applications. If teaching statistics 
is the main purpose of a two-semester course in probability and statistics, this textbook covers all 
the probability concepts necessary for the theoretical development of statistics in two chapters, and 
goes on to cover all major aspects of statistical theory in two semesters, instead of only a portion of 
statistical concepts. What is more, using the optional section on computer examples at the end of 
each chapter, the student can also simultaneously learn to utilize statistical software packages for data 
analysis. It is our aim, without sacrificing any rigor, to encourage students to apply the theoretical 
concepts they have learned. There are many examples and exercises concerning diverse application 
areas that will show the pertinence of statistical methodology to solving real-world problems. The 
examples with statistical software and projects at the end of the chapters will provide good perspective 
on the usefulness of statistical methods. To introduce the students to modern and increasingly popular 
statistical methods, we have introduced separate chapters on Bayesian analysis and empirical methods. 


One of the main aims of this book is to prepare advanced undergraduates and beginning graduate 
students in the theory of statistics with emphasis on interdisciplinary applications. The audience for 
this course is regular full-time students from mathematics, statistics, engineering, physical sciences, 
business, social sciences, materials science, and so forth. Also, this textbook is suitable for people 
who work in industry and in education as a reference book on introductory statistics for a good 
theoretical foundation with clear indication of how to use statistical methods. Traditionally, one of 
the main prerequisites for this course is a semester of the introduction to probability theory. A working 
knowledge of elementary (descriptive) statistics is also a must. In schools where there is no statistics 
major, imposing such a background, in addition to calculus sequence, is very difficult. Most of the 
present books available on this subject contain full one-semester material for probability and then, 
based on those results, continue on to the topics in statistics. Also, some of these books include in their 
subject matter only the theory of statistics, whereas others take the cookbook approach of covering 
the mechanics. Thus, even with two full semesters of work, many basic and important concepts in 
statistics are never covered. This book has been written to remedy this problem. We fuse together 
both concepts in order for students to gain knowledge of the theory and at the same time develop 
the expertise to use their knowledge in real-world situations. 


Although statistics is a very applied subject, there is no denying that it is also a very abstract subject. 
The purpose of this book is to present the subject matter in such a way that anyone with exposure 
to basic calculus can study statistics without spending two semesters of background preparation. 
To prepare students, we present an optional review of the elementary (descriptive) statistics in 
Chapter 1. All the probability material required to learn statistics is covered in two chapters. Stu- 
dents with a probability background can either review or skip the first three chapters. It is also our 


belief that any statistics course is not complete without exposure to computational techniques. At 
XV 
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the end of each chapter, we give some examples of how to use Minitab, SPSS, and SAS to statistically 
analyze data. Also, at the end of each chapter, there are projects that will enhance the knowledge and 
understanding of the materials covered in that chapter. In the chapter on the empirical methods, we 
present some of the modern computational and simulation techniques, such as bootstrap, jackknife, 
and Markov chain Monte Carlo methods. The last chapter summarizes some of the steps necessary 
to apply the material covered in the book to real-world problems. The first eight chapters have been 
class tested as a one-semester course for more than 3 years with five different professors teaching. 
The audience was junior- and senior-level undergraduate students from many disciplines who had 
had two semesters of calculus, most of them with no probability or statistics background. The feed- 
back from the students and instructors was very positive. Recommendations from the instructors and 
students were very useful in improving the style and content of the book. 


AIM AND OBJECTIVE OF THE TEXTBOOK 


This textbook provides a calculus-based coverage of statistics and introduces students to methods of 
theoretical statistics and their applications. It assumes no prior knowledge of statistics or probability 
theory, but does require calculus. Most books at this level are written with elaborate coverage of 
probability. This requires teaching one semester of probability and then continuing with one or 
two semesters of statistics. This creates a particular problem for non-statistics majors from various 
disciplines who want to obtain a sound background in mathematical statistics and applications. 
It is our aim to introduce basic concepts of statistics with sound theoretical explanations. Because 
statistics is basically an interdisciplinary applied subject, we offer many applied examples and relevant 
exercises from different areas. Knowledge of using computers for data analysis is desirable. We present 
examples of solving statistical problems using Minitab, SPSS, and SAS. 


FEATURES 


m During years of teaching, we observed that many students who do well in mathematics courses 
find it difficult to understand the concept of statistics. To remedy this, we present most of 
the material covered in the textbook with well-defined step-by-step procedures to solve real 
problems. This clearly helps the students to approach problem solving in statistics more 
logically. 

u The usefulness of each statistical method introduced is illustrated by several relevant examples. 

m At the end of each section, we provide ample exercises that are a good mix of theory and 
applications. 

a In each chapter, we give various projects for students to work on. These projects are designed 
in such a way that students will start thinking about how to apply the results they learned in 
the chapter as well as other issues they will need to know for practical situations. 

a At the end of the chapters, we include an optional section on computer methods with Minitab, 
SPSS, and SAS examples with clear and simple commands that the student can use to analyze 
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data. This will help students to learn how to utilize the standard methods they have learned in 
the chapter to study real data. 

a We introduce many of the modern statistical computational and simulation concepts, such as 
the jackknife and bootstrap methods, the EM algorithms, and the Markov chain Monte Carlo 
methods such as the Metropolis algorithm, the Metropolis—Hastings algorithm, and the Gibbs 
sampler. The Metropolis algorithm was mentioned in Computing in Science and Engineering as 
being among the top 10 algorithms having the “greatest influence on the development and 
practice of science and engineering in the 20th century.” 

mw We have introduced the increasingly popular concept of Bayesian statistics and decision theory 
with applications. 

m A separate chapter on design of experiments, including a discussion on the Taguchi approach, 
is included. 

m The coverage of the book spans most of the important concepts in statistics. Learning the 
material along with computational examples will prepare students to understand and utilize 
software procedures to perform statistical analysis. 

a Every chapter contains discussion on how to apply the concepts and what the issues are related 
to applying the theory. 

= A student's solution manual, instructor’s manual, and data disk are provided. 

= In the last chapter, we discuss some issues in applications to clearly demonstrate in a unified 
way how to check for many assumptions in data analysis and what steps one needs to follow 
to avoid possible pitfalls in applying the methods explained in the rest of this textbook. 
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Flow Chart 


This flow chart gives some options on how to use the book in a one-semester or two-semester course. 
For a two-semester course, we recommend coverage of the complete textbook. However, Chapters 1, 
9, and 14 are optional for both one- and two-semester courses and can be given as reading exercises. 
For a one-semester course, we suggest the following options: A, B, C, D. 
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Sir Ronald Fisher ER.S. (1890-1962) was one of the leading scientists of the 20th century who 
laid the foundations for modern statistics. As a statistician working at the Rothamsted Agricultural 
Experiment Station, the oldest agricultural research institute in the United Kingdom, he also made 
major contributions to Evolutionary Biology and Genetics. The concept of randomization and the 
analysis of variance procedures that he introduced are now used throughout the world. In 1922 he 
gave a new definition of statistics. Fisher identified three fundamental problems in statistics: (1) 
specification of the type of population that the data came from; (2) estimation; and (3) distribution. 
His book Statistical Methods for Research Workers (1925) was used as a handbook for the methods for 
the design and analysis of experiments. Fisher also published the books titled The Design of Experiments 
(1935) and Statistical Tables (1947). While at the Agricultural Experiment Station he had conducted 
breeding experiments with mice, snails, and poultry, and the results he obtained led to theories about 
gene dominance and fitness that he published in The Genetical Theory of Natural Selection (1930). 


1.1 INTRODUCTION 


In today’s society, decisions are made on the basis of data. Most scientific or industrial studies and 
experiments produce data, and the analysis of these data and drawing useful conclusions from them 
become one of the central issues. The field of statistics is concerned with the scientific study of 
collecting, organizing, analyzing, and drawing conclusions from data. Statistical methods help us 
to transform data to knowledge. Statistical concepts enable us to solve problems in a diversity of 
contexts, add substance to decisions, and reduce guesswork. The discipline of statistics stemmed 
from the need to place knowledge management on a systematic evidence base. Earlier works on 
statistics dealt only with the collection, organization, and presentation of data in the form of tables 
and charts. In order to place statistical knowledge on a systematic evidence base, we require a study 
of the laws of probability. In mathematical statistics we create a probabilistic model and view the 
data as a set of random outcomes from that model. Advances in probability theory enable us to draw 
valid conclusions and to make reasonable decisions on the basis of data. 


Statistical methods are used in almost every discipline, including agriculture, astronomy, biology, 
business, communications, economics, education, electronics, geology, health sciences, and many 
other fields of science and engineering, and can aid us in several ways. Modern applications of statis- 
tical techniques include statistical communication theory and signal processing, information theory, 
network security and denial of service problems, clinical trials, artificial and biological intelligence, 
quality control of manufactured items, software reliability, and survival analysis. The first of these is to 
assist us in designing experiments and surveys. We desire our experiment to yield adequate answers to 
the questions that prompted the experiment or survey. We would like the answers to have good preci- 
sion without involving a lot of expenditure. Statistically designed experiments facilitate development 
of robust products that are insensitive to changes in the environment and internal component varia- 
tion. Another way that statistics assists us is in organizing, describing, summarizing, and displaying 
experimental data. This is termed descriptive statistics. A third use of statistics is in drawing inferences 
and making decisions based on data. For example, scientists may collect experimental data to prove 
or disprove an intuitive conjecture or hypothesis. Through the proper use of statistics we can conclude 
whether the hypothesis is valid or not. In the process of solving a real-life problem using statistics, 
the following three basic steps may be identified. First, consistent with the objective of the problem, 
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we identify the model—the appropriate statistical method. Then, we justify the applicability of the 
selected model to fulfill the aim of our problem. Last, we properly apply the related model to analyze 
the data and make the necessary decisions, which results in answering the question of our problem 
with minimum risk. Starting with Chapter 2, we will study the necessary background material to 
proceed with the development of statistical methods for solving real-world problems. 


In the present chapter we briefly review some of the basic concepts of descriptive statistics. Such 
concepts will give us a visual and descriptive presentation of the problem under investigation. Now, 
we proceed with some basic definitions. 


1.1.1 Data Collection 


One of the first problems that a statistician faces is obtaining data. The inferences that we make depend 
critically on the data that we collect and use. Data collection involves the following important steps. 


GENERAL PROCEDURE FOR DATA COLLECTION 
1. Define the objectives of the problem and proceed to develop the experiment or survey. 
2. Define the variables or parameters of interest. 
3. Define the procedures of data-collection and measuring techniques. This includes sampling 
procedures, sample size, and data-measuring devices (questionnaires, telephone interviews, etc.). 


&Eee-=9(,—e,earaoavwvwmW—we"e—m —r eee 
Example 1.1.1 
We may be interested in estimating the average household income in a certain community. In this case, 
the parameter of interest is the average income of a typical household in the community. To acquire the 
data, we may send out a questionnaire or conduct a telephone interview. Once we have the data, we may 
first want to represent the data in graphical or tabular form to better understand its distributional behavior. 
Then we will use appropriate analytical techniques to estimate the parameter(s) of interest, in this case the 
average household income. 
i 


Very often a statistician is confined to data that have already been collected, possibly even collected 
for other purposes. This makes it very difficult to determine the quality of data. Planned collection 
of data, using proper techniques, is much preferred. 


1.2 BASIC CONCEPTS 


Statistics is the science of data. This involves collecting, classifying, summarizing, organizing, ana- 
lyzing, and interpreting data. It also involves model building. Suppose we wish to study household 
incomes in a certain neighborhood. We may decide to randomly select, say, 50 families and examine 
their household incomes. As another example, suppose we wish to determine the diameter of a rod, 
and we take 10 measurements of the diameter. When we consider these two examples, we note that 
in the first case the population (the household incomes of all families in the neighborhood) really 
exists, whereas in the second, the population (set of all possible measurements of the diameter) is 
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only conceptual. In either case we can visualize the totality of the population values, of which our 
sample data are only a small part. Thus we define a population to be the set of all measurements or 
objects that are of interest and a sample to be a subset of that population. The population acts as the 
sampling frame from which a sample is selected. Now we introduce some basic notions commonly 
used in statistics. 


Definition 1.2.1 A population is the collection or set of all objects or measurements that are of interest to 
the collector. 


OOOO: nnn — ESES<_cee 
Example 1.2.1 
Suppose we wish to study the heights of all female students at a certain university. The population will be 
the set of the measured heights of all female students in the university. The population is not the set of all 


female students in the university. 
= 


In real-world problems it is usually not possible to obtain information on the entire population. The 
primary objective of statistics is to collect and study a subset of the population, called a sample, to 
acquire information on some specific characteristics of the population that are of interest. 


Definition 1.2.2 The sample is a subset of data selected from a population. The size of a sample is the 
number of elements in it. 


——erern eo ee——O oe 
Example 1.2.2 
We wish to estimate the percentage of defective parts produced in a factory during a given week (five days) 
by examining 20 parts produced per day. The parts will be examined each day at randomly chosen times. 
In this case “all parts produced during the week” is the population and the (100) selected parts for five days 


constitutes a sample. 
= 


Other common examples of sample and population are: 


Political polls: The population will be all voters, whereas the sample will be the subset of voters 
we poll. 

Laboratory experiment: The population will be all the data we could have collected if we were 
to repeat the experiment a large number of times (infinite number of times) under the same 
conditions, whereas the sample will be the data actually collected by the one experiment. 

Quality control: The population will be the entire batch of items produced, say, by a machine 
or by a plant, whereas the sample will be the subset of items we tested. 

Clinical studies: The population will be all the patients with the same disease, whereas the 
sample will be the subset of patients used in the study. 

Finance: All common stock listed in stock exchanges such as the New York Stock Exchange, 
the American Stock Exchanges, and over-the-counter is the population. A collection of 20 
randomly picked individual stocks from these exchanges will be a sample. 
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The methods consisting mainly of organizing, summarizing, and presenting data in the form of tables, 
graphs, and charts are called descriptive statistics. The methods of drawing inferences and making 
decisions about the population using the sample are called inferential statistics. Inferential statistics 
uses probability theory. 


Definition 1.2.3 A statistical inference is an estimate, a prediction, a decision, or a generalization about 
the population based on information contained in a sample. 


For example, we may be interested in the average indoor radiation level in homes built on reclaimed 
phosphate mine lands (many of the homes in west-central Florida are built on such lands). In this 
case, we can collect indoor radiation levels for a random sample of homes selected from this area, 
and use the data to infer the average indoor radiation level for the entire region. In the Florida Keys, 
one of the concerns is that the coral reefs are declining because of the prevailing ecosystems. In order 
to test this, one can randomly select certain reef sites for study and, based on these data, infer whether 
there is a net increase or decrease in coral reefs in the region. Here the inferential problem could be 
finding an estimate, such as in the radiation problem, or making a decision, such as in the coral reef 
problem. We will see many other examples as we progress through the book. 


1.2.1 Types of Data 


Data can be classified in several ways. We will give two different classifications, one based on whether 
the data are measured on a numerical scale or not, and the other on whether the data are collected 
in the same time period or collected at different time periods. 


Definition 1.2.4 Quantitative data are observations measured on a numerical scale. Nonnumerical data 
that can only be classified into one of the groups of categories are said to be qualitative or categorical data. 


—ooo—OOOOOOOOO — eee... nn aay 
Example 1.2.3 
Data on response to a particular therapy could be classified as no improvement, partial improvement, or 
complete improvement. These are qualitative data. The number of minority-owned businesses in Florida 
is quantitative data. The marital status of each person in a statistics class as married or not married is 
qualitative or categorical data. The number of car accidents in different U.S. cities is quantitative data. The 
blood group of each person in a community as O, A, B, AB is qualitative data. 
= 


Categorical data could be further classified as nominal data and ordinal data. Data characterized as 
nominal have data groups that do not have a specific order. An example of this could be state names, 
or names of the individuals, or courses by name. These do not need to be placed in any order. Data 
characterized as ordinal have groups that should be listed in a specific order. The order may be either 
increasing or decreasing. One example would be income levels. The data could have numeric values 
such as 1, 2, 3, or values such as high, medium, or low. 


Definition 1.2.5 Cross-sectional data are data collected on different elements or variables at the same 
point in time or for the same period of time. 
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Example 1.2.4 
The data in Table 1.1 represent U.S. federal support for the mathematical sciences in 1996, in millions of 
dollars (source: AMS Notices). This is an example of cross-sectional data, as the data are collected in one 
time period, namely in 1996. 


Table 1.1 Federal Support for the Mathematical 
Sciences, 1996 

Federal agency Amount 
National Science Foundation 91.70 
DMS 85.29 
Other MPS 4.00 
Department of Defense 77.30 
AFOSR 16.70 
ARO 15.00 
DARPA 22.90 
NSA 2.50 
ONR 20.20 
Department of Energy 16.00 
University Support 5.50 
National Laboratories 10.50 
Total, All Agencies 185.00 


Definition 1.2.6 Time series data are data collected on the same element or the same variable at different 
points in time or for different periods of time. 


—ooooeeeEeEeEeEee—e—eEeEeEeeees 
Example 1.2.5 
The data in Table 1.2 represent U.S. federal support for the mathematical sciences during the years 
1995-1997, in millions of dollars (source: AMS Notices). This is an example of time series data, because 
they have been collected at different time periods, 1995 through 1997. 
= 


For an extensive collection of statistical terms and definitions, we can refer to many sources 
such as http://www.stats.gla.ac.uk/steps/glossary/index.html. We will give some other helpful Inter- 
net sources that may be useful for various aspects of statistics: http://www.amstat.org/ (American 
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Table 1.2 United States Federal Support for the Mathematical 
Sciences in Different Years 

Agency 1995 1996 1997 
National Science Foundation 87.69 91.70 98.22 
DMS 85.29 87.70 93.22 
Other MPS 2.40 4.00 5.00 
Department of Defense 77.40 77.30 67.80 
AFOSR 17.40 16.70 17.10 
ARO 15.00 15.00 13.00 
DARPA 21.00 22.90 19.50 
NSA 2.50 2.50 2.10 
ONR 21.40 20.20 16.10 
Department of Energy 15.70 16.00 16.00 
University Support 6.20 5.50 5.00 
National Laboratories 9.50 10.50 11.00 
Total, All Agencies 180.79 185.00 182.02 


Statistical Association), http://www.stat.ufl.edu (University of Florida statistics department), 
http://www.stats.gla.ac.uk/cti/ (collection of Web links to other useful statistics sites), http://www. 
statsoft.com/textbook/stathome.html (covers a wide range of topics, the emphasis is on techniques 
rather than concepts or mathematics), http://www.york.ac.uk/depts/maths/histstat/welcome.htm 
(some information about the history of statistics), http://www.isid.ac.in/ (Indian Statis- 
tical Institute), http://www.math.uio.no/nsf/web/index.htm (The Norwegian Statistical Society), 
http://www.rss.org.uk/ (The Royal Statistical Society), http://lib.stat.cmu.edu/ (an index of statisti- 
cal software and routines). For energy-related statistics, refer to http://www.eia.doe.gov/. There are 
various other useful sites that you could explore based on your particular need. 


EXERCISES 1.2 


1.2.1. Give your own examples for qualitative and quantitative data. Also, give examples for cross- 
sectional and time series data. 


1.2.2. Discuss how you will collect different types of data. What inferences do you want to derive 
from each of these types of data? 


1.2.3. Refer to the data in Example 1.2.4. State a few questions that you can ask about the data. 
What inferences can you make by looking at these data? 
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1.2.4. Refer to the data in Example 1.2.5. Can you state a few questions that the data suggest? What 
inferences can you make by looking at these data? 


1.3 SAMPLING SCHEMES 


In any statistical analysis, it is important that we clearly define the target population. The population 
should be defined in keeping with the objectives of the study. When the entire population is included 
in the study, it is called a census study because data are gathered on every member of the population. 
In general, it is usually not possible to obtain information on the entire population because the 
population is too large to attempt a survey of all of its members, or it may not be cost effective. 
Asmall but carefully chosen sample can be used to represent the population. A sample is obtained by 
collecting information from only some members of the population. A good sample must reflect all the 
characteristics (of importance) of the population. Samples can reflect the important characteristics 
of the populations from which they are drawn with differing degrees of precision. A sample that 
accurately reflects its population characteristics is called a representative sample. A sample that is not 
representative of the population characteristics is called a biased sample. The reliability or accuracy 
of conclusions drawn concerning a population depends on whether or not the sample is properly 
chosen so as to represent the population sufficiently well. 


There are many sampling methods available. We mention a few commonly used simple sampling 
schemes. The choice between these sampling methods depends on (1) the nature of the problem or 
investigation, (2) the availability of good sampling frames (a list of all of the population members), 
(3) the budget or available financial resources, (4) the desired level of accuracy, and (5) the method 
by which data will be collected, such as questionnaires or interviews. 


Definition 1.3.1 A sample selected in such a way that every element of the population has an equal chance 
of being chosen is called a simple random sample. Equivalently each possible sample of size n has an equal 
chance of being selected. 


————OOOOOOOOOOO::212.0 en nn a ——— ss eee 

Example 1.3.1 
For a state lottery, 52 identical Ping-Pong balls with a number from 1 to 52 painted on each ball are put in 
a clear plastic bin. A machine thoroughly mixes the balls and then six are selected. The six numbers on the 
chosen balls are the six lottery numbers that have been selected by a simple random sampling procedure. 
= 


SOME ADVANTAGES OF SIMPLE RANDOM SAMPLING 
1. Selection of sampling observations at random ensures against possible investigator biases. 


2. Analytic computations are relatively simple, and probabilistic bounds on errors can be computed in 
many cases. 

3. It is frequently possible to estimate the sample size for a prescribed error level when designing the 
sampling procedure. 
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Simple random sampling may not be effective in all situations. For example, in a U.S. presidential 
election, it may be more appropriate to conduct sampling polls by state, rather than a nationwide 
random poll. It is quite possible for a candidate to get a majority of the popular vote nationwide and 
yet lose the election. We now describe a few other sampling methods that may be more appropriate 
in a given situation. 


Definition 1.3.2 A systematic sample is a sample in which every Kth element in the sampling frame is 
selected after a suitable random start for the first element. We list the population elements in some order (say 
alphabetical) and choose the desired sampling fraction. 


STEPS FOR SELECTING A SYSTEMATIC SAMPLE 
1. Number the elements of the population from 1 to N. 


2. Decide on the sample size, say n, that we need. 
3. Choose K = N/n. 

4. Randomly select an integer between 1 to K. 

5. Then take every Kth element. 


 —_—$ 


Example 1.3.2 
If the population has 1000 elements arranged in some order and we decide to sample 10% (i.e., N = 1000 
and n= 100), then K = 1000/100 = 10. Pick a number at random between 1 and K = 10 inclusive, say 3. 
Then select elements numbered 3, 13, 23,..., 993. 

[= 


Systematic sampling is widely used because it is easy to implement. If the list of population elements 
is in random order to begin with, then the method is similar to simple random sampling. If, however, 
there is a correlation or association between successive elements, or if there is some periodic struc- 
ture, then this sampling method may introduce biases. Systematic sampling is often used to select a 
specified number of records from a computer file. 


Definition 1.3.3 A stratified sample is a modification of simple random sampling and systematic sampling 
and is designed to obtain a more representative sample, but at the cost of a more complicated procedure. 
Compared to random sampling, stratified sampling reduces sampling error. A sample obtained by stratifying 
(dividing into nonoverlapping groups) the sampling frame based on some factor or factors and then selecting 
some elements from each of the strata is called a stratified sample. Here, a population with N elements is 
divided into s subpopulations. A sample is drawn from each subpopulation independently. The size of each 
subpopulation and sample sizes in each subpopulation may vary. 


STEPS FOR SELECTING A STRATIFIED SAMPLE 
1. Decide on the relevant stratification factors (sex, age, income, etc.). 


2. Divide the entire population into strata (subpopulations) based on the stratification criteria. Sizes of 
strata may vary. 
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3. Select the requisite number of units using simple random sampling or systematic sampling from 
each subpopulation. The requisite number may depend on the subpopulation sizes. 


Examples of strata might be males and females, undergraduate students and graduate students, 
managers and nonmanagers, or populations of clients in different racial groups such as African 
Americans, Asians, whites, and Hispanics. Stratified sampling is often used when one or more of the 
strata in the population have a low incidence relative to the other strata. 


(AAA 


Example 1.3.3 
In a population of 1000 children from an area school, there are 600 boys and 400 girls. We divide them into 
strata based on their parents’ income as shown in Table 1.3. 


Table 1.3 Classification of 
School Children 


Boys’ Girls 


Poor 120 240 


Middle Class 150 100 


Rich 330 60 


This is stratified data. 


—_—_—_—_—..?. aa 
Example 1.3.4 
Refer to Example 1.3.3. Suppose we decide to sample 100 children from the population of 1000 (that is, 
10% of the population). We also choose to sample 10% from each of the categories. For example, we would 
choose 12 (10% of 120) poor boys; 6 (10% of 60 rich girls) and so forth. This yields Table 1.4. This particular 
sampling method is called a proportional stratified sampling. 


Table 1.4 Proportional 
Stratification of School 
Children 


Boys Girls 


Poor 12 24 


Middle Class 15 10 


Rich 33 6 
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SOME USES OF STRATIFIED SAMPLING 

1. In addition to providing information about the whole population, this sampling scheme provides 
information about the subpopulations, the study of which may be of interest. For example, in a U.S. 
presidential election, opinion polls by state may be more important in deciding on the electoral 
college advantage than a national opinion poll. 

2. Stratified sampling can be considerably more precise than a simple random sample, because the 
population is fairly homogeneous within each stratum but there is a sizable variation between the 
strata. 


Definition 1.3.4 In cluster sampling, the sampling unit contains groups of elements called clusters instead 
of individual elements of the population. A cluster is an intact group naturally available in the field. Unlike 
the stratified sample where the strata are created by the researcher based on stratification variables, the clusters 
naturally exist and are not formed by the researcher for data collection. Cluster sampling is also called area 
sampling. 


To obtain a cluster sample, first take a simple random sample of groups and then sample all elements 
within the selected clusters (groups). Cluster sampling is convenient to implement. However, because 
it is likely that units in a cluster will be relatively homogeneous, this method may be less precise than 
simple random sampling. 


--.9.000000...2?>:°—06 .°0.SSS———.<_<_ _ ee 
Example 1.3.5 
Suppose we wish to select a sample of about 10% from all fifth-grade children of a county. We randomly 
select 10% of the elementary schools assumed to have approximately the same number of fifth-grade 
students and select all fifth-grade children from these schools. This is an example of cluster sampling, each 
cluster being an elementary school that was selected. 
= 


Definition 1.3.5 Multiphase sampling involves collection of some information from the whole sample and 
additional information either at the same time or later from subsamples of the whole sample. The multiphase 
or multistage sampling is basically a combination of the techniques presented earlier. 


—_—_—_—_—_—_—_—:::?k Re _—=e—e——__—_ ee  — 
Example 1.3.6 
An investigator in a population census may ask basic questions such as sex, age, or marital status for the 
whole population, but only 10% of the population may be asked about their level of education or about 
how many years of mathematics and science education they had. 


1.3.1 Errors in Sample Data 


Irrespective of which sampling scheme is used, the sample observations are prone to various sources 
of error that may seriously affect the inferences about the population. Some sources of error can 
be controlled. However, others may be unavoidable because they are inherent in the nature of the 
sampling process. Consequently, it is necessary to understand the different types of errors for a proper 
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interpretation and analysis of the sample data. The errors can be classified as sampling errors and 
nonsampling errors. Nonsampling errors occur in the collection, recording and processing of sample 
data. For example, such errors could occur as a result of bias in selection of elements of the sample, 
poorly designed survey questions, measurement and recording errors, incorrect responses, or no 
responses from individuals selected from the population. Sampling errors occur because the sample 
is not an exact representative of the population. Sampling error is due to the differences between the 
characteristics of the population and those of a sample from the population. For example, we are 
interested in the average test score in a large statistics class of size, say, 80. A sample of size 10 grades 
from this resulted in an average test score of 75. If the average test for the entire 80 students (the 
population) is 72, then the sampling error is 75 — 72 = 3. 


1.3.2 Sample Size 


In almost any sampling scheme designed by statisticians, one of the major issues is the determination 
of the sample size. In principle, this should depend on the variation in the population as well as on 
the population size, and on the required reliability of the results, that is, the amount of error that 
can be tolerated. For example, if we are taking a sample of school children from a neighborhood 
with a relatively homogeneous income level to study the effect of parents’ affluence on the academic 
performance of the children, it is not necessary to have a large sample size. However, if the income 
level varies a great deal in the feeding area of the school, then we will need a larger sample size to 
achieve the same level of reliability. In practice, another influencing factor is the available resources 
such as money and time. In later chapters, we present some methods of determining sample size in 
statistical estimation problems. 


The literature on sample survey methods is constantly changing with new insights that demand 
dramatic revisions in the conventional thinking. We know that representative sampling methods 
are essential to permit confident generalizations of results to populations. However, there are many 
practical issues that can arise in real-life sampling methods. For example, in sampling related to 
social issues, whatever the sampling method we employ, a high response rate must be obtained. It 
has been observed that most telephone surveys have difficulty in achieving response rates higher 
than 60%, and most face-to-face surveys have difficulty in achieving response rates higher than 70%. 
Even a well-designed survey may stop short of the goal of a perfect response rate. This might induce 
bias in the conclusions based on the sample we obtained. A low response rate can be devastating to 
the reliability of a study. We can obtain series of publications on surveys, including guidelines on 
avoiding pitfalls from the American Statistical Association (www.amstat.org). In this book, we deal 
mainly with samples obtained using simple random sampling. 


EXERCISES 1.3 


1.3.1. Give your own examples for each of the sampling methods described in this section. Discuss 
the merits and limitations of each of these methods. 


1.3.2. Using the information obtained from the publications of the American Statistical Association 
(www.amstat.org), write a short report on how to collect survey data, and what the potential 
sources of error are. 
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1.4 GRAPHICAL REPRESENTATION OF DATA 


The source of our statistical knowledge lies in the data. Once we obtain the sample data values, one 
way to become acquainted with them is to display them in tables or graphically. Charts and graphs 
are very important tools in statistics because they communicate information visually. These visual 
displays may reveal the patterns of behavior of the variables being studied. In this chapter, we will 
consider one-variable data. The most common graphical displays are the frequency table, pie chart, 
bar graph, Pareto chart, and histogram. For example, in the business world, graphical representations 
of data are used as statistical tools for everyday process management and improvements by decision 
makers (such as managers, and frontline staff) to understand processes, problems, and solutions. The 
purpose of this section is to introduce several tabular and graphical procedures commonly used to 
summarize both qualitative and quantitative data. Tabular and graphical summaries of data can be 
found in reports, newspaper articles, Web sites, and research studies, among others. 


Now we shall introduce some ways of graphically representing both qualitative and quantitative data. 
Bar graphs and Pareto charts are useful displays for qualitative data. 


Definition 1.4.1 A graph of bars whose heights represent the frequencies (or relative frequencies) of respective 
categories is called a bar graph. 


En SS 
Example 1.4.1 
The data in Table 1.5 represent the percentages of price increases of some consumer goods and services 
for the period December 1990 to December 2000 in a certain city. Construct a bar chart for these data. 


Table 1.5 Percentages of Price 
Increases of Some Consumer 
Goods and Services 


Medical Care 83.3% 
Electricity 22.1% 
Residential Rent 43.5% 
Food 41.1% 
Consumer Price Index 35.8% 
Apparel & Upkeep 21.2% 


Solution 
In the bar graph of Figure 1.1, we use the notations MC for medical care, El for electricity, RR for residential 
rent, Fd for food, CPI for consumer price index, and A & U for apparel and upkeep. 

= 
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Wi FIGURE 1.1 Percentage price increase of consumer goods. 


Looking at Figure 1.1, we can identify where the maximum and minimum responses are located, so 
that we can descriptively discuss the phenomenon whose behavior we want to understand. 


For a graphical representation of the relative importance of different factors under study, one can use 
the Pareto chart. It is a bar graph with the height of the bars proportional to the contribution of each 
factor. The bars are displayed from the most numerous category to the least numerous category, as 
illustrated by the following example. A Pareto chart helps in separating significantly few factors that 
have larger influence from the trivial many. 


re 
Example 1.4.2 
For the data of Example 1.4.1, construct a Pareto chart. 


Solution 
First, rewrite the data in decreasing order. Then create a Pareto chart by displaying the bars from the most 
numerous category to the least numerous category. 


Looking at Figure 1.2, we can identify the relative importance of each category such as the maximum, 
the minimum, and the general behavior of the subject data. 


Vilfredo Pareto (1848-1923), an Italian economist and sociologist, studied the distributions of wealth 
in different countries. He concluded that about 20% of people controlled about 80% of a society’s 
wealth. This same distribution has been observed in other areas such as quality improvement: 80% 
of problems usually stem from 20% of the causes. This phenomenon has been termed the Pareto 
effect or 80/20 rule. Pareto charts are used to display the Pareto principle, arranging data so that 
the few vital factors that are causing most of the problems reveal themselves. Focusing improvement 
efforts on these few causes will have a larger impact and be more cost-effective than undirected 
efforts. Pareto charts are used in business decision making as a problem-solving and statistical tool 
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Wl FIGURE 1.2 Pareto chart. 


that ranks problem areas, or sources of variation, according to their contribution to cost or to total 
variation. 


Definition 1.4.2 A circle divided into sectors that represent the percentages of a population or a sample that 
belongs to different categories is called a pie chatt. 


Pie charts are especially useful for presenting categorical data. The pie “slices” are drawn such that 
they have an area proportional to the frequency. The entire pie represents all the data, whereas each 
slice represents a different class or group within the whole. Thus, we can look at a pie chart and 
identify the various percentages of interest and how they compare among themselves. Most statistical 
software can create 3D charts. Such charts are attractive; however, they can make pieces at the front 
look larger than they really are. In general, a two-dimensional view of the pie is preferable. 


$$ 


Example 1.4.3 
The combined percentages of carbon monoxide (CO) and ozone (03) emissions from different sources are 


listed in Table 1.6. 


Table 1.6 Combined Percentages of CO and O3 Emissions 


Transportation Industrial Fuel Solid Miscellaneous 
(T) process (I) combustion (F) waste (S) (M) 
63% 10% 14% 5% 8% 


Construct a pie chart. 


Solution 
The pie chart is given in Figure 1.3. 
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W@ FIGURE 1.3 Pie chart for CO and 03. 


Definition 1.4.3 A stem-and-leaf plot is a simple way of summarizing quantitative data and is well suited 
to computer applications. When data sets are relatively small, stem-and-leaf plots are particularly useful. In a 
stem-and-leaf plot, each data value is split into a “stem” and a “leaf.” The “leaf” is usually the last digit of 
the number and the other digits to the left of the “leaf” form the “stem.” Usually there is no need to sort the 
leaves, although computer packages typically do. For more details, we refer the student to elementary statistics 


books. We illustrate this technique by an example. 


3 $B AN? SANS} _Sq $A 


Example 1.4.4 


Construct a stem-and-leaf plot for the 20 test scores given below. 


78 


74 


82 


66 


94 


71 


64 


88 


55 


80 


91 


74 


82 


75 


96 


78 


84 


79 


71 


83 


Solution 


At a glance, we see that the scores are distributed from the 50s through the 90s. We use the first digit of 
the score as the stem and the second digit as the leaf. The plot in Table 1.7 is constructed with stems in the 


vertical position. 


Table 1.7 Stem-and-Leaf Display of 20 Exam Scores 
Stem Leaves 

5 5 

6 6 4 

7 8 4 1 5 8 9 1 
8 2 8 0) 4 3 

9 4 1 6 
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The stem-and-leaf plot condenses the data values into a useful display from which we can identify 
the shape and distribution of data such as the symmetry, where the maximum and minimum are 
located with respect to the frequencies, and whether they are bell shaped. This fact that the frequencies 
are bell shaped will be of paramount importance as we proceed to study inferential statistics. Also, 
note that the stem-and-leaf plot retains the entire data set and can be used only with quantitative 
data. Examples 1.8.1 and 1.8.6 explain how to obtain a stem-and-leaf plot using Minitab and 
SPSS, respectively. Refer to Section 1.8.3 for SAS commands to generate graphical representations of 
the data. 


A frequency table is a table that divides a data set into a suitable number of categories (classes). Rather 
than retaining the entire set of data in a display, a frequency table essentially provides only a count 
of those observations that are associated with each class. Once the data are summarized in the form 
of a frequency table, a graphical representation can be given through bar graphs, pie charts, and 
histograms. Data presented in the form of a frequency table are called grouped data. A frequency 
table is created by choosing a specific number of classes in which the data will be placed. Generally 
the classes will be intervals of equal length. The center of each class is called a class mark. The end 
points of each class interval are called class boundaries. Usually, there are two ways of choosing class 
boundaries. One way is to choose nonoverlapping class boundaries so that none of the data points 
will simultaneously fall in two classes. Another way is that for each class, except the last, the upper 
boundary is equal to the lower boundary of the subsequent class. When forming a frequency table 
this way, one or more data values may fall on a class boundary. One way to handle such a problem 
is to arbitrarily assign it one of the classes or to flip a coin to determine the class into which to place 
the observation at hand. 


Definition 1.4.4 Let f; denote the frequency of the class i and let n be sum of all frequencies. Then the 
relative frequency for the class i is defined as the ratio f;/n. The cumulative relative frequency for the 
class i is defined by )°)._1 fx/n. 


The following example illustrates the foregoing discussion. 


e_—_—— 
Example 1.4.5 
The following data give the lifetime of 30 incandescent light bulbs (rounded to the nearest hour) of a 
particular type. 


872 | 931 | 1146 | 1079 | 915 | 879 | 863 | 1112 | 979 | 1120 
1150 | 987 | 958 | 1149 | 1057 | 1082 | 1053 | 1048 | 1118 | 1088 
868 | 996 | 1102 | 1130 | 1002 |; 990 | 1052 | 1116 | 1119 | 1028 


Construct a frequency, relative frequency, and cumulative relative frequency table. 


Solution 
Note that there are n = 30 observations and that the largest observation is 1150 and the smallest one is 
865 with a range of 285. We will choose six classes each with a length of 50. 
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Class Frequency Relative frequency Cumulative relative 
frequency 
fi : eee 
L 

PMG: k=1 

50—900 4 4/30 4/30 
900—950 2 2/30 6/30 
950— 1000 E) 5/30 11/30 
1000— 1050 3 3/30 14/30 
1050—1100 6 6/30 20/30 
1100—1150 10 10/30 30/30 


When data are quantitative in nature and the number of observations is relatively large, and there are 
no natural separate categories or classes, we can use a histogram to simplify and organize the data. 


Definition 1.4.5 A histogram is a graph in which classes are marked on the horizontal axis and either 
the frequencies, relative frequencies, or percentages are represented by the heights on the vertical axis. In a 
histogram, the bars are drawn adjacent to each other without any gaps. 


Histograms can be used only for quantitative data. A histogram compresses a data set into a compact 
picture that shows the location of the mean and modes of the data and the variation in the data, 
especially the range. It identifies patterns in the data. This is a good aggregate graph of one variable. 
In order to obtain the variability in the data, it is always a good practice to start with a histogram of 
the data. The following steps can be used as a general guideline to construct a frequency table and 
produce a histogram. 


GUIDELINE FOR THE CONSTRUCTION OF A FREQUENCY TABLE AND HISTOGRAM 

1. Determine the maximum and minimum values of the observations. The range, 
R = maximum value — minimum value. 

2. Select from five to 20 classes that in general are nonoverlapping intervals of equal length, so as to 
cover the entire range of data. The goal is to use enough classes to show the variation in the data, 
but not so many that there are only a few data points in many of the classes. The class width should 
be slightly larger than the ratio 


Largest value — Smallest value 
Number of classes 


3. The first interval should begin a little below the minimum value, and the last interval should end a 
little above the maximum value. The intervals are called class intervals and the boundaries are called 
class boundaries. The class limits are the smallest and the largest data values in the class. The class 
mark is the midpoint of a class. 
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4. None of the data values should fall on the boundaries of the classes. 

5. Construct a table (frequency table) that lists the class intervals, a tabulation of the number of 
measurements in each class (tally), the frequency fj of each class, and, if needed, a column with 
relative frequency, f;/n, where n is the total number of observations. 

6. Draw bars over each interval with heights being the frequencies (or relative frequencies). 


Let us illustrate implementing these steps in the development of a histogram for the data given in the 
following example. 


3 


Example 1.4.6 
The following data refer to a certain type of chemical impurity measured in parts per million in 25 drinking- 
water samples randomly collected from different areas of a county. 


11. | 19 | 24 | 30 | 12 | 20 | 25 | 29 | 15) 21 
24 | 31 | 16 | 23 | 25 | 26 | 32 | 17 | 22 | 26 
35 | 18 | 24 | 18 | 27 


(a) Make a frequency table displaying class intervals, frequencies, relative frequencies, and percent- 
ages. 
(b) Construct a frequency histogram. 


Solution 
(a) We will use five classes. The maximum and minimum values in the data set are 35 and 11. Hence 
the class width is (35 — 11)/5 = 4.85. Hence, we shall take the class width to be 5. The lower 
boundary of the first class interval will be chosen to be 10.5. With five classes, each of width 5, the 
upper boundary of the fifth class becomes 35.5. We can now construct the frequency table for the 


data. 
Class Class interval f; = frequency Relative —_ Percentage 
frequency 
1 10.5 — 15.5 3 3/25 = 0.12 12 
2 15.5 — 20.5 6 6/25 = 0.24 24 
3 20.5 — 25.5 8 8/25 = 0.32 32 
4 25.5 — 30.5 5: 5/25 = 0.20 20 
5 30,5 35.5 3 3/25 = 0.12 12 
(b) We can generate a histogram as in Figure 1.4. = 


From the histogram we should be able to identify the center (i.e., the location) of the data, spread 
of the data, skewness of the data, presence of outliers, presence of multiple modes in the data, and 
whether the data can be capped with a bell-shaped curve. These properties provide indications of the 
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Wi FIGURE 1.4 Frequency histogram of impurity data. 


proper distributional model for the data. Examples 1.8.2 and 1.8.7 explain how to obtain histograms 
using Minitab and SPSS, respectively. 


EXERCISES 1.4 


1.4.1. According to the recent U.S. Federal Highway Administration Highway Statistics, the per- 
centages of freeways and expressways in various road mileage-related highway pavement 
conditions are as follows: 

Poor 10%, Mediocre 32%, Fair 22%, Good 21%, and Very good 15%. 
(a) Construct a bar graph. 
(b) Construct a pie chart. 


1.4.2. More than 75% of all species that have been described by biologists are insects. Of the 
approximately 2 million known species, only about 30,000 are aquatic in any life stage. The 
data in Table 1.4.1 give proportion of total species by insect order that can survive exposure 
to salt (source: http://entomology.unl.edu/marine_insects/marineinsects.htm). 


Table 1.4.1 

Species Percentage Species Percentage 
Coleoptera 26% Odonata 3% 
Diptera 35% Thysanoptera 3% 
Hemiptera 15% Lepidoptera 1% 
Orthoptera 6% Other 6% 
Collembola 5% 
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(a) Construct a bar graph. 
(b) Construct a Pareto chart. 
(c) Construct a pie chart. 


1.4.3. The datain Table 1.4.2 are presented to illustrate the role of renewable energy consumption 
in the U.S. energy supply in 2007 (source: http://www.eia.doe.gov/fuelrenewable.html). 
Renewable energy consists of biomass, geothermal energy, hydroelectric energy, solar energy, 
and wind energy. 


Table 1.4.2 


Source Percentage 
Coal 22% 
Natural Gas 23% 
Nuclear Electric Power 8% 
Petroleum 40% 
Renewable Energy 7% 


(a) Construct a bar graph. 
(b) Construct a Pareto chart. 
(c) Construct a pie chart. 


1.4.4. A litter is a group of babies born from the same mother at the same time. Table 1.4.3 
gives some examples of different mammals and their average litter size (source: http:// 
www.saburchill.com/chapters/chap0032.html). 


Table 1.4.3 

Species Litter size 
Bat 1 
Dolphin 1 
Chimpanzee 1 
Lion 3 
Hedgehog > 
Red Fox 6 
Rabbit 6 
Black Rat 11 
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(a) Construct a bar graph. 
(b) Construct a Pareto chart. 


1.4.5. The following data give the letter grades of 20 students enrolled in a statistics course. 


A|B|F]A;C;)C;D/A|]B|F 
C;/D;/B/A;B;/A;F{B|C]JA 


(a) Construct a bar graph. 
(b) Construct a pie chart. 


1.4.6. According to the U.S. Bureau of Labor Statistics (BLS), the median weekly earnings of full- 
time wage and salary workers by age for the third quarter of 1998 is given in Table 1.4.4. 


Table 1.4.4 

16 to 19 years $260 
20 to 24 years $334 
25 to 34 years $498 
35 to 44 years $600 
45 to 54 years $628 
55 to 64 years $605 
65 yearsandover $393 


Construct a pie chart and bar graph for these data and interpret. Also, construct a Pareto 
chart. 


1.4.7. The data in Table 1.4.5 are a breakdown of 18,930 workers in a town according to the type 
of work. 
Construct a pie chart and bar graph for these data and interpret. 


1.4.8. The data in Table 1.4.6 represent the number (in millions) of adults and children liv- 
ing with HIV/AIDS by the end of 2000 according to the region of the world (source: 
http://w3.whosea.org/hivaids/factsheet.htm). 

Construct a bar graph for these data. Also, construct a Pareto chart and interpret. 


1.4.9. The data in Table 1.4.7 give the life expectancy at birth, in years, from 1900 through 2000 
(source: National Center for Health Statistics). 
Construct a bar graph for these data. 


1.4.10. Dolphins are usually identified by the shape and pattern of notches and nicks on their dorsal 
fin. Individual dolphins are cataloged by classifying the fin based on location of distinguish- 
ing marks. When a dolphin is sighted its picture can then be compared to the catalog of 
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Table 1.4.5 
Mining 58 
Construction 1161 
Manufacturing 2188 
Transportation and Public Utilities 821 
Wholesale Trade 657 
Retail Trade 7377 
Finance, Insurance, and Real Estate 890 
Services 5778 
Total 18,930 
Table 1.4.6 
Country Adults and children living 
with HIV/AIDS (in millions) 
Sub-Saharan Africa 25.30 
North Africa and Middle East 0.40 
South and Southeast Asia 5.80 
East Asia and Pacific 0.64 
Latin America 1.40 
Caribbean 0.39 
Eastern Europe and Central Asia 0.70 
Western Europe 0.54 
North America 0.92 
Australia and New Zealand 0.15 


dolphins in the area, and ifa match is found, the dolphin can be recorded as resighted. These 
methods of mark-resight are for developing databases regarding the life history of individual 
dolphins. From these databases we can calculate the levels of association between dolphins, 
population estimates, and general life history parameters such as birth and survival rates. 
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Table 1.4.7 

Year Life expectancy 
1900 47.3 
1960 69.7 
1980 73.7 
1990 75.4 
2000 77.0 


The data in Table 1.4.8 represent frequently resighted individuals (as of January 2000) at a 
particular location (source: http://www.eckerd.edu/dolphinproject/biologypr.html). 


Table 1.4.8 


Hammer (adult female) 59 


Mid Button Flag (adult female) 41 


Luseal (adult female) 31 


84 Lookalike (adult female) 20 


Construct a bar graph for these data. 


1.4.11. The data in Table 1.4.9 give death rates (per 100,000 population) for 10 leading causes in 
1998 (source: National Center for Health Statistics, U.S. Deptartment of Health and Human 
Services). 

(a) Construct a bar graph. 
(b) Construct a Pareto chart. 


1.4.12. Ina fiscal year, a city collected $32.3 million in revenues. City spending for that year is 
expected to be nearly the same, with no tax increase projected. 
Expenditure: Reserves 0.7%, capital outlay 29.7%, operating expenses 28.9%, debt service 
3.2%, transfers 5.1%, personal services 32.4%. 
Revenues: Property taxes 10.2%, utility and franchise taxes 11.3%, licenses and permits 1%, 
inter governmental revenue 10.1%, charges for services 28.2%, fines and forfeits 0.5%, 
interest and miscellaneous 2.7%, transfers and cash carryovers 36%. 
(a) Construct bar graphs for expenditure and revenues and interpret. 
(b) Construct pie charts for expenditure and revenues and interpret. 
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Table 1.4.9 

Cause Death rate 
Accidents and Adverse Effects 34.5 
Chronic Liver Disease and Cirrhosis 9.7 
Chronic Obstructive Lung Diseases and Allied Conditions 42.3 
Cancer 199.4 
Diabetes Mellitus 23.9 
Heart Disease 268.0 
Kidney Disease 9.7 
Pneumonia and Influenza 35.1 
Stroke 58.5 
Suicide 10.8 


1.4.13. Construct a histogram for the 24 examination scores given next. 


78 | 74 | 82 | 66 | 94 | 71 | 64 | 88 | 55 | 80 | 73 | 86 
91 | 74) 82) 75 | 96) 78 | 84) 79 | 71 | 83 | 78 | 79 


1.4.14. The following table gives radon concentration in pCi/liter obtained from 40 houses in a 
certain area. 


2.9/0.6/13.5/17.1} 2.8/3.8} 16.0} 2.1] 6.4} 17.2 
7.9|0.5) 13.7) 11.5) 2.9/3.6} 6.1] 88/22] 9.4 
15.9/88} 9.8} 11.5 | 12.3 | 3.7| 8.9 | 13.0 | 7.9 | 11.7 
6.2] 6.9) 12.8) 13.7) 2.7/3.5} 83/15.9/5.1] 6.0 


(a) Construct a stem-and-leaf display. 
(b) Construct a frequency histogram and interpret. 
(c) Construct a pie chart and interpret. 


1.4.15. The following data give the mean of SAT Mathematics scores by state for 1999 fora randomly 
selected 20 states (source: The World Almanac and Book of Facts 2000). 


558 | 503 | 565 | 572 | 546 | 517 | 542 | 605 | 493 | 499 
568 | 553 | 510 | 525 | 595 | 502 | 526 | 475 | 506 | 568 


(a) Construct a stem-and-leaf display and interpret. 
(b) Construct a frequency histogram and interpret. 
(c) Construct a pie chart and interpret. 
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1.4.16. Asample of 25 measurements is given here: 


9} 28) 14 | 29} 21 | 27 | 15 | 23 | 23 | 10 
31 | 23 | 16 | 26 | 22 | 17 | 19 | 24 | 21 | 20 
26 | 20 | 16 | 14} 21 


(a) Make a frequency table displaying class intervals, frequencies, relative frequencies, and 
percentages. 
(b) Construct a frequency histogram and interpret. 


1.5 NUMERICAL DESCRIPTION OF DATA 


In the previous section we looked at some graphical and tabular techniques for describing a data set. 
We shall now consider some numerical characteristics of a set of measurements. Suppose that we 
have a sample with values x), x2,...,X,). There are many characteristics associated with this data set, 
for example, the central tendency and variability. A measure of the central tendency is given by the 
sample mean, median, or mode, and the measure of dispersion or variability is usually given by the 
sample variance or sample standard deviation or interquartile range. 


Definition 1.5.1 Let x1, x2,...,X», be a set of sample values. Then the sample mean (or empirical 
mean) x is defined by 


The sample variance is defined by 


The sample standard deviation is 


s= V3. 


The sample variance s* and the sample standard deviation s both are measures of the variability or 
“scatteredness” of data values around the sample mean X. Larger the variance, more is the spread. 
We note that s? and s are both nonnegative. One question we may ask is “why not just take the sum 
of the differences (x; — ¥) as a measure of variation?” The answer lies in the following result which 
shows that if we add up all deviations about the sample mean, we always get a zero value. 


Theorem 1.5.1 For a given set of measurements x1, X2,..., Xn, let X be the sample mean. Then 


do @ —x)=0. 
i=1 
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Proof. Since X = (1/n) 7, xi, we have 7", x1 = nx. Now 


n n n 

> Gi -x)= be: ~S\ox 

i=1 i=1 i=1 
=nx—nx=0. 


Thus although there may be a large variation in the data values, )7"_, (x; — X) as a measure of spread 
would always be zero, implying no variability. So it is not useful as a measure of variability. 


Sometimes we can simplify the calculation of the sample variance s* by using the following 
computational formula: 


If the data set has a large variation with some extreme values (called outliers), the mean may not 
be a very good measure of the center. For example, average salary may not be a good indicator of 
the financial well-being of the employees of a company if there is a huge difference in pay between 
support personnel and management personnel. In that case, one could use the median as a measure 
of the center, roughly 50% of data fall below and 50% above. The median is less sensitive to extreme 
data values. 


Definition 1.5.2 For a data set, the median is the middle number of the ordered data set. If the data set has 
an even number of elements, then the median is the average of the middle two numbers. The lower quartile 
is the middle number of the half of the data below the median, and the upper quartile is the middle number 
of the half of the data above the median. We will denote 


Q, = lower quartile 
Q2 = M = middle quartile (median) 


Q3 = upper quartile 


The difference between the quartiles is called interquartile range (IQR). 
IQR= Q3-Q). 
A possible outlier (mild outlier) will be any data point that lies below 
Q1 — 1.5(/QR) or above 03 + 1.5([OR). 
Note that the IQR is unaffected by the positions of those observations in the smallest 25% or the 
largest 25% of the data. 


Mode is another commonly used measure of central tendency. A mode indicates where the data tend 
to concentrate most. 


28 CHAPTER 1 Descriptive Statistics 


Definition 1.5.3 Mode is the most frequently occurring member of the data set. If all the data values are 
different, then by definition, the data set has no mode. 


ES 
Example 1.5.1 
The following data give the time in months from hire to promotion to manager for a random sample of 25 
software engineers from all software engineers employed by a large telecommunications firm. 


5 | 7 | 229 | 453 | 12] 14] 18 | 14) 14 | 483 
22 | 21 25 | 23] 24 | 34] 37 | 34) 49 | 64 
47 | 67 | 69} 192 | 125 


Calculate the mean, median, mode, variance, and standard deviation for this sample. 


Solution 
The sample mean is 


y= 


Slr 


n 
a = 83.28 months. 
i=1 


To obtain the median, first arrange the data in ascending order: 


5 7 |) 12 14 14) 14 | 18 | 27 | 22 | 23 
24) 25 | 34} 34) 371] 47 | 49 | 64 | 67 | 69 
125 | 192 | 229 | 453 | 483 


Now the median is the thirteenth number which is 34 months. 
Since 14 occurs most often (thrice), the mode is 14 months. 
The sample variance is 


1 n 

2 =\2 

s y ;—X 
nai a ) 


1 
4 [6 - 83.28)? +--+ (125 — 83.28)" | 

= 16,478. 
and the sample standard deviation is, s = s2 = 128.36 months. Thus, we have sample mean ¥ = 83.28 
months, median= 34 months, and mode = 14 months. Note that the mean is very much different from the 
other two measures of center because of a few large data values. Also, the sample variance s2 = 16,478 
months, and the sample standard deviation s = 128.36 months. 


=—S— SS 
Example 1.5.2 


For the data of Example 1.5.1, find lower and upper quartiles, median, and interquartile range (IQR). Check 
for any outliers. 


1.5 Numerical Description ofData 29 


Solution 
Arrange the data in an ascending order. 


5 7 | 12 14 14.) 14} 18 | 27 | 22 | 23 
24) 25 | 34) 34] 37) 47 | 49 | 64 | 67 | 69 
125 | 192 | 229 | 453 | 483 


Then the median M is the middle (13th) data value, M = Q» = 34. The lower quartile is the middle number 
below the median, Q, = [((14+ 18)/2] = 16. The upper quartile, Q3 = [(67 + 69)/2] = 68. 

The interquartile range, IQR) = Q3 — Q1 = 68-16 = 52. 

To test for outliers, compute 


Q1 — 1.5(1QR) = 16 — 1.5(52) = —62 


and 


Q3 + 1.5(/QR) = 68 + 1.5(52) = 146. 
Then all the data that fall above 146 are possible outliers. None is below —62. Therefore the outliers are 192, 
229, 453, and 483. 


We have remarked earlier that the mean as a measure of central location is greatly affected by the 
extreme values or outliers. A robust measure of central location (a measure that is relatively unaffected 
by outliers) is the trimmed mean. For 0 < a < 1, a 100#% trimmed mean is found as follows: Order 
the data, and then discard the lowest 100@% and the highest 100% of the data values. Find the mean 
of the rest of the data values. We denote the 100a% trimmed mean by X,. We illustrate the trimmed 
mean concept in the following example. 


$A SAO YN? NN ANN NS Ss 
Example 1.5.3 


For the data set representing the number of children in a random sample of 10 families in a neighborhood, 
find the 10% trimmed mean (@ = 0.1). 


1223 23 9 16 2 


Solution 
Arrange the data in ascending order. 


717 222233 69 


The data set has 10 elements. Discarding the lowest 10% (10% of 10 is 1) and discarding the highest 10% of 
the data values, we obtain the trimmed data set as 
Rog BoD 2-8) 8 6 
The 10% trimmed mean is 
1424+24+24+2+343+6 7 
8 


Note that the mean for the data in the previous example without removing any observations is 3.1, which is 
different from the trimmed mean. 


X01 = 2.6. 
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Examples 1.8.2 and 1.8.7 explain how to obtain a histogram using Minitab and SPSS, respectively. 
Example 1.8.9 demonstrates the SAS commands to obtain the descriptive statistics. 


Although standard deviation is a more popular method, there are other measures of dispersion such 
as average deviation or interquartile range. We have already seen the definition of interquartile range. 
The average deviation for a sample x1, ..., X, is defined by 


n 
D be -* 


Average deviation = * 
n 


Calculation of average deviation is simple and straightforward. 


1.5.1 Numerical Measures for Grouped Data 


When we encounter situations where the data are grouped in the form of a frequency table (see 
Section 1.4), we no longer have individual data values. Hence, we cannot use the formulas in Defi- 
nition 1.5.1. The following formulas will give approximate values for ¥ and s?. Let the grouped data 
have / classes, with m; being the midpoint and f; being the frequency of class i, i = 1,2,...,/. Let 


i= aa fi. 
Definition 1.5.4 The mean for a sample of size n, 
if 
— > fim. 
i=1 
where m; is the midpoint of the class i and _f; is the frequency of the class i. 


(om) 


n 2. i 
a SS filmi pea Emil a 
i=1 


Similarly the sample variance, 


n-14 n-1 
The following example illustrates how we calculate the sample mean for a grouped data. 


|. el 
Example 1.5.4 
The grouped data in Table 1.8 represent the number of children from birth through the end of the teenage 
years in a large apartment complex. Find the mean, variance, and standard deviation for these data: 


Table 1.8 Number of Children and Their Age Group 


Class 0-3 4-7 8-11 12-15 16-19 


Frequency 7 4 19 12 8 
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Solution 
For simplicity of calculation we create Table 1.9. 


Table 1.9 
Clas ff; mj mf; mf; 
0-3 7 1.5 10.5 15.75 
4-7 4 55 22 121 
8-11 19 9.5 180.5 1714.75 
12-15 12 13.5 162 2187 
16-19 8 17.5 140 2450 
n=50 Yim fi=515 Yom? fi = 6488.5 


The sample mean is 


The sample variance is 


2 
E fim) 
2 ( : (515)? 
m2 f,- ~ {4 6488.5 — 
esa mj fi n = 50 _ 94.16. 


n—-1 49 


The sample standard deviation is s = Vs* = /24.16 = 4.92. 
Ba 


Using the following calculations, we can also find the median for grouped data. We only know that the 
median occurs in a particular class interval, but we do not know the exact location of the median. We 
will assume that the measures are spread evenly throughout this interval. Let 


L = lower class limit of the interval that contains the median 

n = total frequency 
F, = cumulative frequencies for all classes before the median class 
fm = frequency of the class interval containing the median 

w = interval width of the interval that contains the median 


Then the median for the grouped data is given by 
M=L+—-(0.5n— Ff). 
fin 


We proceed to illustrate with an example. 
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I 


Example 1.5.5 
For the data of Example 1.5.4, find the median. 


Solution 
First develop Table 1.10. 


Table 1.10 

Class f; Cumulative f; Cumulative f;/n 
0-3 7 7 0.14 

4-7 4 11 0.22 

8-11 19 30 0.6 
12-15 12 42 0.84 
16-19 8 50 1.00 


The first interval for which the cumulative relative frequency exceeds 0.5 is the interval that contains the 
median. Hence the interval 8 to 11 contains the median. Therefore, L = 8, fin = 19,n = 50, w = 3, 
and F, = 11. Then, the median is 


M=L+ = (Gin=F). see 55 ((0-5)(50) = {1j=i0901, 


It is important to note that all the numerical measures we calculate for grouped data are only 
approximations to the actual values of the ungrouped data if they are available. 


One of the uses of the sample standard deviation will be clear from the following result, which is 
based on data following a bell-shaped curve. Such an indication can be obtained from the histogram 
or stem-and-leaf display. 


EMPIRICAL RULE 
When the histogram of a data set is “bell shaped” or “mound shaped,” and symmetric, the empirical rule 
states: 

1. Approximately 68% of the data are in the interval (x — s,x +s). 

2. Approximately 95% of the data are in the interval (x — 2s,x + 2s). 

3. Approximately 99.7% of the data are in the interval (x — 3s, xX + 3s). 


The bell-shaped curve is called a normal curve and is discussed later in Chapter 3. A typical symmetric 
bell-shaped curve is given in Figure 1.5. 
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Normal distribution 


W@ FIGURE 1.5 Bell-shaped curve. 


1.5.2 Box Plots 


The sample mean or the sample standard deviation focuses on a single aspect of the data set, whereas 
histograms and stem-and-leaf displays express rather general ideas about data. A pictorial summary 
called a box plot (also called box-and-whisker plots) can be used to describe several prominent features 
of a data set such as the center, the spread, the extent and nature of any departure from symmetry, 
and identification of outliers. Box plots are a simple diagrammatic representation of the five number 
summary: minimum, lower quartile, median, upper quartile, maximum. Example 1.8.4 illustrates 
the method of obtaining box plots using Minitab. 


PROCEDURE TO CONSTRUCT A BOX PLOT 

1. Draw a vertical measurement axis and mark Q;, Q2 (median), and Q3 on this axis as shown in 
Figure 1.6. 
Construct a rectangular box whose bottom edge lies at the lower quartile, Q; and whose upper 
edge lies at the upper quartile, Q3. 
Draw a horizontal line segment inside the box through the median. 
Extend the lines from each end of the box out to the farthest observation that is still within 1.5(/QR) 
of the corresponding edge. These lines are called whiskers. 
Draw an open circle (or asterisks *) to identify each observation that falls between 1.5(/QR) and 
3(/QR) from the edge to which it is closest; these are called mild outliers. 
Draw a solid circle to identify each observation that falls more than 3(/QR) from the closest edge; 
these are called extreme outliers. 


2 


3 


P 


5 


6 
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o~ Extreme outliers 
e) 
Sacaeere 3(/QR) 
k 
*~ Mild outliers 
ated es 1.5(/QR) 
= Whisker 
i Ge ae Q3 
Laie Bee Qs 
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Whisker 
aaa elaine 1.5(/QR) 
ot Mild outliers 
k 
ReIcaa daa 3(/QR) 


om——____—_———— Extreme outliers 


W@ FIGURE 1.6 A typical box-and-whiskers plot. 


We illustrate the procedure with the following example. 


= 
Example 1.5.6 
The following data identify the time in months from hire to promotion to chief pharmacist for a random 
sample of 25 employees from a certain group of employees in a large corporation of drugstores. 


5] 7 | 229 | 453 | 12] 14] 18 | 14) 14 | 483 
22 | 21 25} 23 | 24 | 34] 37 | 34) 49 | 64 
47 | 67 | 69} 192 | 125 


Construct a box plot. Do the data appear to be symmetrically distributed along the measurement axis? 


Solution 
Referring to Example 1.5.2, we find that the median, Q2 = 34. 
The lower quartile is Q, = 1418 = 16. 


The upper quartile is Q3 = eres = 68. 


The interquartile range is IQR = 68 — 16 = 52. 
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To find the outliers, compute 
Q; — 1.5UQR) = 16 — 1.5(52) = —62 
and 
Q3+1.5UQR) = 68+ 1.5(52) = 146. 
Using these numbers, we follow the procedure outlined earlier to construct the box plot in Figure 1.7. The * 


in the box plot represents an outlier. The first horizontal line is the first quartile, the second is the median, 
and the third is the third quartile. 


500 + 
400 + 
300 + 
200 + 
100 + 

0-4 


*xK 


Months 
*K 


W@ FIGURE 1.7 Box plot for months to promotion. 
[ies 


By examining the relative position of the median line (the middle line in Figure 1.7), we can test the 
symmetry of the data. For example, in Figure 1.7, the median line is closer to the lower quartile than 
the upper line, which suggests that the distribution is slightly nonsymmetric. Also, a look at this box 
plot shows the presence of two mild outliers and two extreme outliers. 


EXERCISES 1.5 
1.5.1. The prices of 12 randomly chosen homes in dollars (approximated to nearest thousand) in 
a growing region of Tampa in the summer of 2002 are given below. 


176 105 133 140 305 215 207 210 173 150 78 96 


Find the mean and standard deviation of the sampled home prices from this area. 
1.5.2. The following is a sample of nine mortgage companies’ interest rates for 30-year home 
mortgages, assuming 5% down. 


7.625 7.500 6.625 7.625 6.625 6.875 7.375 5.375 7.500 


(a) Find the mean and standard deviation and interpret. 
(b) Find lower and upper quartiles, median, and interquartile range. Check for any outliers 
and interpret. 


1.5.3. For four observations, it is given that mean is 6, median is 4, and mode is 3. Find the standard 
deviation of this sample. 
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1.5.4. 


1.5.5. 


1.5.6. 


1.5.7. 


1.5.8. 


The data given below pertain to a random sample of disbursements of state highway funds 
(in millions of dollars), to different states. 


1188 | 1050 | 2882 | 2802 | 780] 1171 | 685 
537 | 519 | 2523 | 316} 1117 | 1578 | 261 


(a) Find the mean, variance, and range for these data and interpret. 

(b) Find lower and upper quartiles, median and interquartile range. Check for any outliers 
and interpret. 

(c) Construct a box plot and interpret. 


Maximal static inspiratory pressure (PImax) is an index of respiratory muscle strength. The 
following data show the measure of PImax (cm H20O) for 15 cystic fibrosis patients. 


105 | 80] 115} 95} 100} 85 | 90 | 70 
135 | 105) 45] 115) 40) 115 | 95 


(a) Find the lower and upper quartiles, median, and interquartile range. Check for any 
outliers and interpret. 

(b) Construct a box plot and interpret. 

(c) Are there any outliers? 


Compute the mean, variance, and standard deviation for the data in Table 1.5.1 (assume 
that the data belong to a sample). 


Table 1.5.1 
Class 0-4 5-9 10-14 15-19 20-24 
Frequency a 14 15 10 6 


(a) For any grouped data with / classes with group frequencies f;, and class midpoints m;, 
show that 


I 
d fim; — %) = 0. 
i=1 


(b) Verify this result for the data given in Exercise 1.5.6. 


fi 2 
 (&) 
Soi - 3)? =)ix?- a 
i=1 i=1 


(b) Verify the result of part (a) for the data of Exercise 1.5.5. 


(a) Given the sample values x1, x2, ..., Xn, show that 


1.5 Numerical Description of Data 37 


1.5.9. The following are the closing prices of some securities that a mutual fund holds on a certain 
day: 


10.25 | 5.31 | 11.25 | 13.13 | 18.00 | 32.56 | 37.06 | 39.00 
43.25 | 45.00 | 40.06 | 28.56 | 22.75 | 51.50 | 47.00 | 53.50 
32.00 | 25.44 | 22.50 | 30.00 | 24.75 | 53.37 | 51.38 | 26.00 
53.50 | 29.87 | 32.00 | 28.87 | 42.19 | 37.50 | 30.44 | 41.37 


(a) Find the mean, variance, and range for these data and interpret. 

(b) Find lower and upper quartiles, median, and interquartile range. Check for any outliers. 

(c) Construct a box plot and interpret. 

(d) Construct a histogram. 

(e) Locate on your histogram x, ¥ + s, ¥ + 2s, and ¥ + 3s. Count the data points in each of 
the intervals ¥ + s, ¥ + 2s, and ¥ + 3s and compare this with the empirical rule. 


1.5.10. The radon concentration (in pCi/liter) data obtained from 40 houses in a certain area are 
given below. 


2.9 |0.6)13.5]17.1} 2.8 | 3.8] 16.0} 2.1 | 6.4) 17.2 
7.9 |0.5] 13.7] 11.5) 2.9 |3.6} 6.1 | 8.8 | 2.2] 9.4 
15.9}8.8| 9.8 | 11.5 | 12.3) 3.7) 8.9 | 13.0] 7.9 | 11.7 
6.2 | 6.9] 12.8} 13.7) 2.7 |3.5} 8.3 | 15.9] 5.1] 6.0 


(a) Find the mean, variance, and range for these data. 

(b) Find lower and upper quartiles, median, and interquartile range. Check for any outliers. 

(c) Construct a box plot. 

(d) Construct a histogram and interpret. 

(e) Locate on your histogram ¥ + s, ¥+ 2s, and X + 3s. Count the data points in each of the 
intervals x, ¥ +s, ¥ + 2s, and x + 3s. How do these counts compare with the empirical 
rule? 


1.5.11. A random sample of 100 households’ weekly food expenditure represented by x from a 
particular city gave the following statistics: 


) > xj = 11,000, and ) ° x; = 1,900,000. 


(a) Find the mean and standard deviation for these data. 

(b) Assuming that the food expenditure of the households of an entire city of 400,000 will 
have a bell-shaped distribution, how many households of this city would you expect to 
fall in each of the intervals, ¥ +s, ¥ + 2s, and ¥ + 3s? 


1.5.12. The following numbers are the hours put in by 10 employees of company in a randomly 
selected week: 
40 46 40 54 18 45 34 60 39 42 


(a) Calculate the values of the three quartiles and the interquartile range. Also, calculate 
the mean and standard deviation and interpret. 
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(b) Verify for this data set that 7°, (xi — x) = 0. 
(c) Construct a box plot. 
(d) Does this data set contain any outliers? 


1.5.13. For the following data: 


6.3 | 2.9 | 4.5) 1.1) 1.8 | 4.0 | 1.2 | 3.1) 2.0 | 4.0 
7.0 | 2.8 | 4.3 | 5.3 | 2.9 | 8.3 | 4.4 | 2.8 | 3.1 | 5.6 
4.5] 4.5 ]5.7 | 0.5 | 6.2 | 3.7 | 0.9 | 2.4 | 3.0 | 3.5 


(a) Find the mean, variance, and standard deviation. 

(b) Construct a frequency table with five classes. 

(c) Using the grouped data formula, find the mean, variance, and standard deviation for 
the frequency table constructed in part (b) and compare it to the results in part (a). 


1.5.14. In order to assess the protective immunizing activity of various whooping cough vaccines, 
suppose that 30 batches of different vaccines are tested on groups of children. Suppose that 
the following data give immunity percentage in home exposure values (IPHE values). 


85 | 51 | 41 | 90 | 91 | 40 | 39 | 69 | 45 | 47 
42 | 12) 70 | 38 | 97 | 34 | 94 | 77 | 88 | 91 
79 | 90 | 43 | 40 | 89 | 85 | 71 | 30 | 25 | 21 


(a) Find the mean, variance, and standard deviation and interpret. 

(b) Construct a frequency table with five classes. 

(c) Using the grouped data formula, find the mean, variance, and standard deviation for 
the table in part (b) and compare it to the results in part (a). 


1.5.15. The grouped data in Table 1.5.2 give the number of births by age group of mothers between 
ages 10 and 39 ina certain state in 2000. 


Find the median for this grouped data and interpret. 


1.5.16. Table 1.5.3 gives the distribution of the masses (in grams) of 50 salmon from a single young 
cohort. 


Table 1.5.2 
Age of mother Number of births 


10-14 895 
15-19 55,373 
20-24 122,591 
25-29 139,615 
30-34 127,502 


35-39 68,685 
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Table 1.5.3 


Weight 155-164 165-174 175-184 185-194 195-204 


Frequency 8 11 18 9 4 


(a) Using the grouped data formula, find the mean, variance, and standard deviation 
(b) Find the median for this grouped data. 


1.5.17. After a pollution accident, 180 dead fish were recovered from a stream. Table 1.5.4 gives 
their lengths measured to the nearest millimeter. 


Table 1.5.4 


Length of Fish(mm) 1-19 20-39 40-59 60-79 80-99 


Frequency 38 31 59 45 7 


(a) Using the grouped data formula, find the mean, variance, and standard deviation. 
(b) Find the median for this grouped data and interpret. 


1.6 COMPUTERS AND STATISTICS 


With present-day technology, we can automate most statistical calculations. For small sets of data, 
many basic calculations such as finding means and standard deviations and creating simple charts, 
graphing calculators are sufficient. Students should learn how to perform statistical analysis using 
their handheld calculators. For deeper analysis and for large data sets, statistical software is necessary. 
Software also provides easier data entry and editing and much better graphics in comparison to 
calculators. There are many statistical packages available. Many such analyses can be performed with 
spreadsheet application programs such as Microsoft Excel, but a more thorough data analysis requires 
the use of more sophisticated software such as Minitab and SPSS. For students with programming 
abilities, packages such as MATLAB may be more appealing. For very large data sets and for complicated 
data analysis, one could use SAS. SAS is one of the most frequently used statistical packages. Many 
other statistical packages (such as R, Splus, and StatXact) are available; the utilities and advantages 
of each are based on the specific application and personal taste. For example, R is free software that 
is being increasingly used by statisticians and can be downloaded from http://www.r-project.org/, 
and a statistical tutorial for R can be found at http://www.biometrics.mtu.edu/CRAN/. For a good 
introduction to doing statistics with R, refer to the book by Peter Dalgaard, Introductory Statistics, with 
R, Springer, 2002. 


In this book, we will give some representative Minitab, SPSS, and SAS commands at the end of each 
chapter just to get students started on the technology. These examples are by no means a tutorial for 
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the respective software. For a more thorough understanding and use of technology, students should 
look at the users’ manual that comes with the software or at references given at the end of the book. 
The computer commands are designed to be illustrative, rather than completely efficient. In dealing 
with data analysis for real-world problems, we need to know which statistical procedure to use, 
how to prepare the data sets suitable for use in the particular statistical package, and finally how 
to interpret the results obtained. A good knowledge of theory supplemented with a good working 
knowledge of statistical software will enable students to perform sophisticated statistical analysis, 
while understanding the underlying assumptions and the limitations of results obtained. This will 
prevent us from misleading conclusions when using computer-generated statistical outputs. 


1.7 CHAPTER SUMMARY 


In this chapter, we dealt with some basic aspects of descriptive statistics. First we gave basic definitions 
of terms such as population and sample. Some sampling techniques were discussed. We learned about 
some graphical presentations in Section 1.4. In Section 1.5 we dealt with descriptive statistics, in 
which we learned how to find mean, median, and variance and how to identify outliers. A brief 
discussion of the technology and statistics was given in Section 1.6. All the examples given in this 
chapter are for a univariate population, in which each measurement consists of a single value. Many 
populations are multivariate, where measurements consist of more than one value. For example, we 
may be interested in finding a relationship between blood sugar level and age, or between body height 
and weight. These types of problems will be discussed in Chapter 8. 


In practice, it is always better to run descriptive statistics as a check on one’s data. The graphical and 
numerical descriptive measures can be used to verify that the measurements are sound and that there 
are no obvious errors due to collection or coding. 


We now list some of the key definitions introduced in this chapter. 


Population 

Sample 

Statistical inference 
Quantitative data 

Qualitative or categorical data 
Cross-sectional data 

Time series data 

Simple random sample 
Systematic sample 

Stratified sample 
Proportional stratified sampling 
Cluster sampling 

Multiphase sampling 

Relative frequency 
Cumulative relative frequency 
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m Bar graph 

m Pie chart 

= Histogram 

m= Sample mean 

m= Sample variance 

= Sample standard deviation 
= Median 

= Interquartile range 
m Mode 

m Mean 

= Empirical rule 

= Box plots 

th 


In this chapter, we have also introduced the following important concepts and procedures: 


General procedure for data collection 

Some advantages of simple random sampling 

Steps for selecting a stratified sample 

Procedures to construct frequency and relative frequency tables and graphical representations 

such as stem-and-leaf displays, bar graphs, pie charts, histograms, and box plots 

= Procedures to calculate measures of central tendency, such as mean and median, as well as 
measures of dispersion such as the variance and standard deviation for both ungrouped and 
grouped data 

= Guidelines for the construction of frequency tables and histograms 

= Procedures to construct a box plot 


1.8 COMPUTER EXAMPLES 


In this section, we give some examples of how to use Minitab, SPSS, and SAS for creating graphical 
representations of the data as well as methods for the computation of basic statistics. Sometimes, the 
outputs obtained using a particular software package may not be exactly as explained in the book; they 
vary from one package to another, and also depend on the particular software version. It is important to 
obtain the explanation of outputs from the help menu of the particular software package for complete 
understanding. The “Computer Examples” sections of this book are not designed as manuals for the 
software, nor are they written in the most efficient way. The idea is only to introduce some basic 
procedures, so that the students can get started with applying the theoretical material they have seen 
in each of the chapters. 


1.8.1 Minitab Examples 


A good place to get help on Minitab is http://www.minitab.com/resources/. There are many nice 
sites available on Minitab procedures; for example, Minitab student tutorials can be obtained from 
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http://www.minitab.com/resources/tutorials/. Here we illustrate only some of the basic uses of 
Minitab. In Minitab, we can enter the data in the spreadsheet and use the Windows pull-down menus, 
or we can directly enter the data and commands. We will mostly give procedures for the pull-down 
menus only. It is up to the user’s taste to choose among these procedures. It should be noted that 
with different versions of Minitab, there will be some differences in the pull-down menu options. It 
is better to consult the Help menu for the actual procedure. 
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Example 1.8.1 (Stem-and-Leaf): 
For the following data, construct a stem-and-leaf display using Minitab: 


78 | 74 | 82 | 66 | 94 | 71 | 64 | 88 | 55 | 80 
91 | 74 | 82 | 75 | 96 | 78 | 84 | 79 | 71 | 83 


Solution 
For the pull-down menu, first enter the data in column 1. Then follow the following sequence. The boldface 
represents the actions. 


Graph > Character Graphs > Stem-and-Leaf 
In Variables: type C7 and click OK 


We will get the following output: 


Stem-and-Leaf of C1 N = 20 


Leaf Unit= 1.0 

1 5: 5 

2 6 4 

3 6 6 

vA 7 1 1 4 4 
(4) 7 5 8 8 9 
9 8 0 2 2 3 4 
4 8 8 

3 9 14 

1 


The following are the explanations of each column in the stem-and-leaf display, as given in the Minitab Help 
menu. The display has three columns: 


Left: Cumulative count of values from the top of the figure down and from the bottom of the figure up 
to the middle. 


Middle number in parentheses (stem): Count of values in the row containing the median. Parenthe- 
ses around the median row are omitted if the median falls between two lines of the display. 


Right (leaves): Fach value is a single digit to place after the stem digits, representing one data value. 
The leaf unit tells you where to put the decimal place in each number. 
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Note that this display is a little different from the one we explained in Section 1.4. However, if we combine 


the stems and the corresponding leaves, we will get the representation as in Section 1.4. 
= 
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Example 1.8.2 (Histogram): 
For the following data, construct a histogram: 


25 | 37 | 20 | 31 |) 31 | 21 | 12 | 25 | 36 | 27 
38 | 16 | 40 | 32 | 33 | 24 | 39 | 26 | 27 | 19 


Solution 
Enter the data in C1, then use the following sequence 


Graph > Histogram. .. > in Graph variables: type C1 > OK 


We will get the histogram as shown in Figure 1.8. 


Frequency 


10 15 20 25 30 35 40 


W FIGURE 1.8 Histogram for data of Example 1.8.2. 


If we want to change the number of intervals, after entering Graph variables, click Options... and click 


Number of intervals and enter the desired number, then OK. 
| 


Example 1.8.3 (Descriptive Statistics): 
In this example, we will describe how to obtain basic statistics such as mean, median, and standard deviation 


for the following data: 
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5] 7 | 229 | 453 | 12 | 14] 18 | 14) 14 | 483 
22 | 21 25} 23] 24 | 34| 37 | 34) 49 | 64 
47 | 67 | 69} 192 | 125 


Solution 
Enter the data in C1. Then use 


Stat > Basic Statistics > Display Descriptive Statistics. .. > in Variables: type C1 > click OK 


We will get the following output: 


Variable N Mean Median 
Cl 25 83.3 34.0 
TrMean StDev SEMean Minimum Maximum 
69.3 128.4 25:7 5.0 483.0 
Q) Q3 
16.0 68.0 


Here, TrMean represents the trimmed mean. A 5% trimmed mean is calculated. Minitab removes the smallest 
5% and the largest 5% of the values (rounded to the nearest integer) and then averages the remaining values. 
Also, SE Mean gives the standard error of the mean. It is calculated as StDev/SQRT (N), where StDev is the 
standard deviation. 

ea) 


i nn 
Example 1.8.4 (Sorting and Box Plot): 
For the following data, first sort in the increasing order and then construct a box plot to check for outliers. 


870 | 922 | 1146 | 1120 | 1079 | 905 888} 865] 1112 | 966 
1150 | 977 | 958 | 1088 | 1139 | 1055 | 1082 | 1053 | 1048 | 1118 
866 | 996 | 1102 | 1028 | 1130 | 1002 | 990 | 1052 | 1116 | 1109 


Solution 
After entering the data in C1, we can sort the data in increasing order as follows: 


Manip > Sort... > in Sort column(s): type C1 > in Store sorted column(s) in: type C2 > in Sorted 
by column: type C1 > OK 


In column C2, we will get the following sorted data: 

C2 
865 866 870 888 905 922 958 966 977 990 996 
1002 1028 1048 1052 1053 1055 1079 1082 1088 1102 1109 
1712 1176 1718 1120 1130 1139 1146 1150 


If we want to draw a box plot for the data, do the following: 
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Graph > Box plot... > in Graph variables: under Y, type C1 > OK 


We will get the box plot as shown in Figure 1.9. 


11504 


1100 5 


1050 5 


1000 5 


950 5 


900 + 


850 + 


W@ FIGURE 1.9 Box plot data of Example 1.8.4. 
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Example 1.8.5 (Test of Randomness): 

Almost all of the analyses in this book assume that the sample is random. How can we verify whether the 
sample is really random? Project 12B explains a procedure called run test. Without going into details, this 
test is simple with Minitab. All we have to do is enter the data in C1. Then click 


Stat > Nonparametric > Runs Test... > in variables: enter C1 > OK 


For instance, if we have the following data: 


24 | 31 | 28 | 43 | 28 | 56 | 48 | 39 | 52 | 32 
38 | 49 | 51 | 49 | 62 | 33 | 41 | 58 | 63 | 56 


we will get following output: 


Run Test 
Gl 
K = 44.0500 


The observed number of runs = 14 


The expected number of runs = 11.0000 


10 Observations above K 10 below 
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* N Small -- The following approximation may be 
invalid 


The test is significant at 0.1681 
Cannot reject at alpha = 0.05 
“Cannot reject” in the output means that it is reasonable to assume that the sample is random. For any 


data, it is always desirable to do a run test to determine the randomness. 
= 


1.8.2 SPSS Examples 
For SPSS, we will give only Windows commands. For all the pull-down menus, the sequence will be 
separated by the > symbol. 


= 


Example 1.8.6 
Redo Example 1.8.1 with SPSS. 


Solution 
After entering the data in C1, 


Analyze > Descriptive Statistics > Explore... > 
At the Explore window select the variable and move to Dependent List; then click Plots..., select 
Stem-and-Leaf, click Continue, and click OK at the Explore Window 


We will get the output with a few other things, including box plots along with the stem-and-leaf display, 


which we will not show here. 
liza 


cGQ7 
Example 1.8.7 
Redo Example 1.8.2 with SPSS. 


Solution 
After entering the data: 


Graphs > Histogram. .. > 
At the Histogram window select the variable and move to Variable, and click OK 


We will get the histogram, which we will not display here. 


OE ——————_—_—_ 


Example 1.8.8 
Redo Example 1.8.3 with SPSS. 
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Solution 
Enter the data. Then: 


Analyze > Descriptive Statistics > Frequencies... > 

At the Frequencies window select the variable(s); then open the Statistics window and check 
whichever boxes you desire under Percentile, Dispersion, Central Tendency, and Distribution > 
continue > OK 


For example, if you select Mean, Median, Mode, Standard Deviation, and Variance, we will get the following 
output and more: 


Statistics 
VAROOO01 
N Valid 25. 
Missing 0 
Mean 83.2800 
Median 34,0000 
Mode 14.00 
Std. Deviation 128.36488 
Variance 16477.54333 


1.8.3 SAS Examples 


We will now give some SAS procedures describing the numerical measures of a single variable. PROC 
UNIVARIATE will give mean, median, mode, standard deviation, skewness, kurtosis, etc. If we do 
not need median, mode, and so on, we could just as well use PROC MEANS in lieu of PROC 
UNIVARIATE. We can use the following general format in writing SAS programs with appropriate 
problem-specific modifications. There are many good online references as well as books available for 
SAS procedures. To get support on SAS, including many example codes, refer to the SAS support Web 
site: http://support.sas.com/. Another helpful site can be found at http://www.ats.ucla.edu/stat/sas/. 
There are many other sites that may suit your particular application. 


GENERAL FORMAT OF AN SAS PROGRAM 

DATA give a name to the data set; 

INPUT here we put variable names and column locations, if there are more than one variable; 
CARDS; (also we can use DATALINES;) 

Enter the data here; 

TITLE ‘here we include the title of our analysis’; 

PROC PRINT; 

PROC name of procedure (such as PROC UNIVARIATE) goes here; 
Options that we may want to include (such as the variables 

to be used) go here; 

RUN; 
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After writing an SAS program, to execute it we can go to the menu bar and select run>submit, or click 
the “running man” icon. On execution, SAS will output the results to the Output window. All the 
steps used including time of execution and any error messages will be given in the Log window. 


In order to make the SAS outputs more manageable, we can use the following SAS command at the 
beginning of an SAS program: 
options 1s=80 ps=50; 


Is stands for line size, and this sets each line to be 80 characters wide. ps stands for page size and 
allows 50 lines on each page. This reduces the number of unnecessary page breaks. In order to avoid 
date and number, we can use the option commands: 


Options nodate nonumber; 


oz, 


Example 1.8.9 
For the data of Example 1.8.3, use PROC UNIVARIATE to summarize the data. 


Solution 
In the program editor window, type the following if you are entering the data directly. If you are using the 
data stored in a file, the comment line (with *) should be used instead of the input and data lines. 


Options nodate nonumber; 

DATA ex9; 

INPUT ex9 @@; 

DATALINES; 

5 7 229 ABS 2 14 is ia ila ais 
22 2 25 23 24 Ba 37 3a ao 64 
AT 67 68 192 125s 

PROC UNIVARIATE ; 

WMWMLEs 


RUN; 


In this case we will get the following output: 


The UNIVARIATE Procedure 
Variable: ex9 


Moments 
N 25 Sum Weights 25 
Mean 83.28 Sum Observations 2082 


Std Deviation 128.364884 Variance 16477 .5433 


1.8 Computer Examples 49 


Skewness 2.45719194 Kurtosis 5.47138396 
Uncorrected SS 568850 Corrected SS 395461.04 
Coeff Variation 154.136508 Std Error Mean 25.6/729767 


Basic Statistical Measures 


Location Variability 
Mean 83.28000 Std Deviation 128.36488 
Median 34.00000 Variance 16478 
Mode 14.00000 Range 478.00000 

Interquartile Range 49.00000 

Tests for Location: Mu0=0 
Test -Statistic- -p Value- 
Student’s t t 3.243878 Pr > |t| 0.0035 
Sign M 12.5 Pr >= |M| <.0001 
Signed Rank S 162.5 Pr >= |S] <.0001 


Quartiles (Definition 5) 
Quartile eoulete 


100% Max 48 
99% 483 
95% 453 
90% 229 
75% Q3 67 
50% Median 34 
25% Q1 18 
10% 12 
5% 7 
1% 5 
0% Min 5 


The UNIVARIATE Procedure 
Variable: ex9 
Extreme Observations 


-Lowest- -Highest- 
Value Obs Value Obs 
5 1 125 25 
7 2 192 24 
12 5 229 3 
14 9 453 4 
14 8 483 10 


We can observe from the previous output that PROC UNIVARIATE gives much information about the data, 
such as mean, standard deviation, and quartiles. If we do not want all these details, we could use the PROC 
MEANS command. In the previous code, if we replace PROC UNIVARIATE by the PROC MEANS statement, 
we will get the following: 


The MEANS Procedure 
Analysis Variable : ex9 


N Mean Std Dev Minimum Maximum 


The output is greatly simplified. 
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If we use PROC UNIVARIATE PLOT NORMAL; this option will produce three plots: stem-and-leaf, box plot, 
and normal probability plot (this will be discussed later in the text). In order to obtain bar graphs at the 
midpoints of the class intervals, use the following commands: 


PROC CHART DATA=ex9; 
VBAR ex9; 


If we want to create a frequency table, use the following: 


PROC FREQ; 
table ex9; 
title ‘Frequency tabulation’; 


Every PROC or procedure has its own name and options. We will use different PROCs as we need them. 
Always remember to enclose titles in single quotes. There are various other actions that we can perform 
for the data analysis using SAS. It is beyond the scope of this book to explain general and efficient SAS 
codes. For details, we refer to books dedicated to SAS, such as the book by Ronald P. Cody and Jeffrey K. 
Smith, Applied Statistics and the SAS Programming Language, 5th Edition, Prentice Hall, 2006. There are 
many Web sites that give SAS codes. One example with references for many aspects of SAS, including 
many codes, can be found at http://www.sas.com/service/library/onlinedoc/code.samples.html. 


EXERCISES 1.8 


1.8.1. The following data represent the lengths (to the nearest whole millimeter) of 80 shoots from 
seeds of a certain type planted at the same time. 


75 | 72 | 76 | 76 | 72 | 74 | 71 | 75 | 77 | 72 
74 | 71 | 76 | 76 | 76 | 72 | 71 | 73 | 73 | 71 
72 | 72 | 75 | 70 | 74 | 74 | 78 | 74 | 76 | 79 
75 | 76 | 73 | 73 | 71 | 72 | 79 | 74 | 77 | 72 
76 | 70 | 72 | 75 | 78 | 72 | 69 | 75 | 72 | 71 
77 | 79 | 76 | 73: | 75 | 73 | 72 | 75 | 74 | 78 
73 | 77 | 73 | 77 | 70 | 74 | 66 | 74 | 73 | 77 
75 | 79 | 75 | 70 | 72 | 73 | 80 | 73 | 78 | 75 


Using one of the software packages (Minitab, SPSS, or SAS): 

(a) Represent the data in a histogram. 

(b) Find the summary statistics such as mean, median, variance, and standard deviation. 
(c) Draw box plots and identify any outliers. 
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1.8.2. Ona particular day, asked, “How many minutes did you exercise today?” the following were 
the responses of 30 randomly selected people: 


15 | 30} 25) 10 | 30/15) 10 | 45 | 20 | 22 
18} 0/45) 12/15/10) 17 | 30 | 30) 15 
10 | 30 | 20| 8] 18 | 30 | 27 | 33 | 15] 0 


Using one of the software packages (Minitab, SPSS, or SAS): 

(a) Represent the data in a histogram. 

(b) Find the summary statistics such as mean, median, variance, and standard deviation. 
(c) Draw box plots and identify any outliers. 


PROJECTS FOR CHAPTER 1 
1A. World Wide Web and Data Collection 


Statistical Abstracts of the United States is a rich source of statistical data (http://www.census. 
gov/prod/www/statistical-abstract-us.html). Pick any category of interest to you and obtain data 
(say, Income, Expenditures, and Wealth). Represent a section of the data graphically. Find mean, 
median, and standard deviation. Identify any outliers. There are many other sites, such as 
http://lib.stat.cmu.edu/datasets/ and http://it.stlawu.edu/~rlock/datasurf.html, that we can use for 
obtaining real data sets. 


1B. Preparing a List of Useful Internet Sites 
Prepare a list of Internet references for various aspects of statistical study. 


1C. Dot Plots and Descriptive Statistics 

From the local advertisements of apartments for rent, randomly pick 50 monthly rents for two- 
bedroom apartments. For these data, first draw a dot plot and then obtain descriptive statistics (use 
Minitab, SPSS, or SAS, or any other statistical software). 


This page intentionally left blank 


Chapter 


Basic Concepts from Probability Theory 


Objective: In this chapter we will review some results from probability theory that are essential for 
the development of the statistical results of this book. 
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Andrei Kolmogorov (1903-1987) laid the mathematical foundations of probability theory and the 
theory of randomness. His monograph Grundbegriffe der Wahrscheinlichkeitsrechnung, published in 
1933, introduced probability theory in a rigorous way from fundamental axioms. He later used 
probability theory to study the motion of the planets and the turbulent flow of air from a jet 
engine. He also made important contributions to stochastic processes, information theory, statis- 
tical mechanics, and nonlinear dynamics. Kolmogorov had numerous interests outside mathematics. 
In particular, he was interested in the form and structure of the poetry of the Russian author 
Pushkin. 


2.1 INTRODUCTION 


Probability theory provides a mathematical model for the study of randomness and uncertainty. 
The concept of probability occupies an important role in the decision-making process, whether the 
problem is one faced in business, in engineering, in government, in sciences, or just in one’s own 
everyday life. Most decisions are made in the face of uncertainty. The mathematical models of prob- 
ability theory enable us to make predictions about certain mass phenomena from the necessarily 
incomplete information derived from sampling techniques. It is the probability theory that enables 
one to proceed from descriptive statistics to inferential statistics. In fact, probability theory is the most 
important tool in statistical inference. 


The origin of probability theory can be traced to modeling of games of chances such as dealing 
from a deck of cards, or spinning a roulette wheel. The earliest results on probability arose from 
the collaboration of the eminent mathematicians Blaise Pascal and Pierre Fermant and a gambler, 
Chevalier de Méré. They were interested in what seemed to be contradictions between mathemat- 
ical calculations and actual games of chance, such as throwing dice, tossing coin, or spinning a 
roulette wheel. For example, in repeated throws of a die, it was observed that each number, 1 to 6, 
appeared with a frequency of approximately 1/6. However, if two dice are rolled, the sum of num- 
bers showing on two dice, that is, 2 to 12, did not appear equally often. It was then recognized 
that, as the number of throws increased, the frequency of these possible results could be predicted 
by following some simple rules. Similar basic experiments were conducted using other games of 
chance, which resulted in the establishment of various basic rules of probability. Probability theory 
was developed solely to be applied to games of chance until the 18th century, when Pierre Laplace 
and Karl F. Gauss applied the basic probabilistic rules to other physical problems. Modern proba- 
bility theory owes much to the 1933 publication Foundations of Theory of Probability by the Russian 
mathematician Andrei N. Kolmogorov. He developed the probability theory from an axiomatic point 
of view. 


Our objective in this chapter is to provide only a brief review of various definitions and facts from 
probability that are needed elsewhere in the text. Proofs are omitted in most cases. Many books are 
devoted solely to the study of probability theory and we refer to them for further details and deeper 
understanding. 
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2.2 RANDOM EVENTS AND PROBABILITY 


Any process whose outcome is not known in advance but is random is termed an experiment. The term 
experiment is used here in a wider sense than the usual notion of a controlled laboratory testing situa- 
tion. Thus an experiment may include observing whether a fuse is defective or not, or the duration of 
time from start to end of rain in a particular place. Assume that the experiment can be repeated any 
number of times under identical conditions. Each repetition is called a trial. A (random) experiment 
satisfies the following three conditions: (1) the set of all possible outcomes are known in advance 
in each trial; (2) in any particular trial, it is not known which particular outcome will happen; and 
(3) the experiment can be repeated under identical conditions. We will now summarize some 
notations and concepts for our study of probability. 


BASIC DEFINITIONS 

1. The sample space associated with an experiment is the set consisting of all possible outcomes and 
is called the sure event in the experiment. A sample space is also referred to as a probability space. A 
sample space will be denoted by S. 
An outcome in S is also called a sample point. An event A is a subset of outcomes in S, that is, A C S. 
We say that an event A occurs if the outcome of the experiment is in A. 
3. The null subset ¢ of S is called an impossible event. 
4. The event A U B consists of all outcomes that are in A or in B or in both. 
The event AM B consists of all outcomes that are both in A and B. 
6. The event AS (the complement of A in S) consists of all outcomes not in A, but in S. 


. 


BS) 
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Using these concepts, we can define the following. All events are considered to be subsets of S. For 
some more concepts from set theory, we refer to Appendix A1. 


Definition 2.2.1 Two events A and B are said to be mutually exclusive or disjoint if AN B = ¢. Mutually 
exclusive events cannot happen together. 


The mathematical definition of probability has changed from its earliest formulation as a measure 
of belief to the modern approach of defining through the axioms. We shall discuss four definitions 
of probability. We now give an informal definition of probability. 


INFORMAL DEFINITION OF PROBABILITY 

Definition 2.2.2 The probability of an event is a measure (number) of the chance with which we can expect 
the event to occur. We assign a number between 0 and 1 inclusive to the probability of an event. A probability of 
1 means that we are 100% sure of the occurrence of an event, and a probability of 0 means that we are 100% 
sure of the nonoccurrence of the event. The probability of any event A in the sample space S is denoted by P(A). 


From this definition, we can see that P(S) = 1. The earliest approach to measuring uncertainty (in 
chance events) is the classical probability concept, which applies when all possible outcomes are 
equally likely or when the probabilities of outcomes are known. 
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CLASSICAL DEFINITION OF PROBABILITY 
Definition 2.2.3 If there are n equally likely possibilities, of which one must occur, and m of these are regarded 
as favorable to an event, or as “success,” then the probability of the event or a “success” is given by m/n. 


Now we give steps that can be used to compute the probabilities of events using this classical approach. 


METHOD OF COMPUTING PROBABILITY BY THE CLASSICAL APPROACH 
A. When all outcomes are equally likely 
1. Count the number of outcomes in the sample space; say this is n. 
2. Count the number of outcomes in the event of interest, A, and say this is m. 
3. P(A) = m/n. 
B. When all outcomes are not equally likely 
1. Let O;, 0, ...,On be the outcomes of the sample space S. Let P(O;) = pj,i = 1,2, ...,n. In this 
case, the probability of each outcome, p;, is assumed to be known. 
2. List all the outcomes in A, say, Oj, Oj, ...,Om. 
3. P(A) = P(O;) + P(Q;) staat a Orn) ate ate ceat Pm, the sum of the probabilities of the 
outcomes in A. 


——————————n—n—s eee EE 
Example 2.2.1 
A balanced die (with all outcomes equally likely) is rolled. Let A be the event that an even number 
occurs. Then there are three favorable outcomes (2, 4, 6) in A, and the sample space has six elements, 
{1, 2,3, 4,5, 6}. Hence P(A) = 3/6 = 1/2. 
|| 


ooo, 


Example 2.2.2 
Suppose we toss two coins. Assume that all the outcomes are equally likely (fair coins). 
(a) What is the sample space? 
(b) Let A be the event that at least one of the coins shows up heads. Find P(A). 
(c) What will be the sample space if we know that at least one of the coins showed up heads? 


Solution 
(a) The sample space consists of four outcomes, namely S = {(H, H), (H, T), (T, A), (T, T)}. 
(b) The event A has three outcomes, (H, H), (H, T), and (T, H). Therefore P(A) = 3/4. 
(c) Since we know that at least one of the coins showed up heads, the possible outcomes are (H, H), 
(H, T), and (T, H). The sample space now has only three outcomes {(H, H), (H, T), (T, H)}. 
= 


The classical probability concept is not applicable in situations where the various possibilities cannot 
be regarded as equally likely. Suppose we are interested in whether or not it will rain on a given 
day with known meteorological conditions. Clearly we cannot assume that the events of rain or 
no rain are equally likely. In such cases, one could use the so-called frequency interpretation of 
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probability. The frequentistic view is a natural extension of the classical view of probability. This 
definition was developed as the result of work by R. von Mises in 1936. 


FREQUENCY DEFINITION OF PROBABILITY 
Definition 2.2.4 The probability of an outcome (event) is the proportion of times the outcome (event) 
would occur in a long run of repeated experiments. 


For example, to find the probability of heads, H, using a biased coin, we would imagine the coin 
is repeatedly tossed. Let n(H) be the number of times H appears in n trials. Then the probability of 
heads is defined as P(H) = limy-..0(n(H)/n). 


The frequency interpretation of probability is often useful. However it is not complete. Because of 
the condition of repetition under identical circumstances, the frequency definition of probability is 
not applicable to every event. For a more complete picture, it makes sense to develop the probability 
theory through axioms. Now we will define probabilities axiomatically. This definition results from 
the 1933 studies of A. N. Kolmogorov. 


AXIOMATIC DEFINITION OF PROBABILITY 
Definition 2.2.5 Let S be a sample space of an experiment. Probability P(.) is a real-valued function that 
assigns to each event A in the sample space S a number P(A), called the probability of A, with the following 
conditions satisfied: 

1. It is nonnegative, P(A) => 0. 

2. It is unity for a certain event. That is, P(S) = 1. 


3. It is additive over the union of an infinite number of pairwise disjoint events, that is, if A,, A2,... form 
a sequence of pairwise mutually exclusive events (that is, A; Aj = ¢, fori # j) in S, then 


P25 Aj) = RAR), 


From the previous three axioms, it can be shown that P(¢) = 0, and if A;, A, ... form a sequence of 
pairwise mutually exclusive events in S, then P(\U_, Ai) = 0, P(A;) for a finite n. Also we could 
verify that 0 < P(A) < 1, for any event A. It is important to observe that the axioms do not tell us 
how to assign probabilities to events. 


eeeeeS50qv0302CCcC SS ooo 
Example 2.2.3 
A die is loaded (not all outcomes are equally likely) such that the probability that the number i shows up is 
Ki,i=1,2,..., 6, where K is a constant. Find 
(a) the value of K. 
(b) the probability that a number greater than 3 shows up. 
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Solution 
(a) Here the sample space S has six outcomes {1, 2,..., 6}. Hence, using axioms (2) and (3) we have 


P()+ P(2)+...+ P(6) =1. 
Since P(i) = Ki, we have 


(K)(1) + (K)(2) +... + (K)(6) = 1 or 
(K)(1+2+...4+6) = (K)(21) =1. 


Hence K = 1/21. 
The probability of, say, the number 5 showing up is 5/21. 

(b) Let A be the event that a number greater than 3 shows up. Then the outcomes in A are {4, 5, 6} 
and they are mutually exclusive. Therefore, 


P(A) = P(4) + P(5) + P(6) 
4 5 6 15 


ntata 2 


The following properties help us in going beyond the axioms to actually compute various 
probabilities. 


SOME BASIC PROPERTIES OF PROBABILITY 
For two events A and B in S, we have the following: 
1. P(A) = 1 — P(A), where AS is the complement of the set A in S. 
2. IfA Cc B, then P(A) < P(B). 
3. P(AUB) = P(A) + P(B) — P(AMB). 
In particular, if AB = ¢, then P(A U B) = P(A) + P(B). 


—_—_e_—_—. lS —S6 
Example 2.2.4 
In a large university, the freshman profile for one year’s fall admission says that 40% of the students were in 
the top 10% of their high school class, and that 65% are white, of whom 25% were in the top 10% of their 
high school class. What is the probability that a freshman student selected randomly from this class either 
was in the top 10% of his or her high school class or is white? 


Solution 

Let Ey be the event that a person chosen at random was in the top 10% of his or her high school class, 
and let Ey be the event that the student is white. We are given P(E,) = 0.40, P(E2) = 0.65, and 
P(E, M E2) = 0.25. Then the event that the student chosen is white or was in the top 10% of his or her 
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high scool class is represented by Ey U E>. Thus 


P(E U Ey) = P(Ey) + P(E2) — P(E1 2 £2) 
= 0.40 + 0.65 — 0.25 = 0.80. 
= 


—o—oeowrnrn— ere ...—n—n nn aaEey—_— 

Example 2.2.5 
A subway station in a large city has 12 gates, six inbound (entering into the subway station) and six outbound 
(exiting the subway station). The number of gates open in each direction is observed at a particular time of 
day. Assume that each outcome of the sample space is equally likely. 

(a) Define a suitable sample space. 

(b) What is the probability that at most one gate is open in each direction? 

(c) What is the probability that at least one gate is open in each direction? 

(d) What is the probability that the number of gates open is the same in both directions? 

(e) What is the probability of the event that the total number of gates open is six? 


Solution 
(a) We define the sample space to be the set of ordered pairs (x, y), where x is the number of inbound 
gates open and y is the number of outbound gates open. For example, (4,5) means four gates 
for inbound and five gates for outbound are open. (1,0) means one gate is open in the inbound 
direction and no gate is open in the outbound direction. Figure 2.1 represents the situation 


(0,0) (0,1) (0,2) (0,3) (0,4) (0,5) (0,6) 
d,0) @G,1) @,2) €,3) (1,4) (1,5) @,6) 
(2,0) (2,1) (2,2) (2,3) (2,4) (2,5) (2,6) 
S= 4 (3,0) (3,1) (3,2) @,3) @3,4 ©,5) @G,6) 
(4,0) (4,1) (4,2) (4,3) (4,4) (4,5) (4,6) 
(5,0) (5,1) (5,2) (5,3) (5,4) (5,5) (5,6) 
(6,0) (6,1) (6,2) (6,3) (6,4) (6,5) (6,6) 


We see that the sample space has 49 possible outcomes. We assume that these outcomes are equally 


likely. 
ee) ul > 
| 2 > 
Enter oi > 
| 5 > 
| 6 > 
= 1 |< . 
= 3 ke Exit 
2° $$ A 
— 5 jt 
—= 6 ~<a 


W@ FIGURE2.1 Inbound and outbound traffic. 
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(b) Suppose that A is the event that at most one gate is open in each direction. Then 


A= {(0, 0), (0, 1), (1, 0), (1, 1}. 


Hence, 
4 
P(A) = — = 0.082 
49 


(c) Let B be the event that at least one gate is open in each direction. Then B contains 36 elements. 
Hence, 


36 
P(B) = — = 0.7347. 
49 


(d) Let 


C = event that number of open gates is the same both ways 


= {(0, 0), (1, 1), @, 2), (3, 3), (4, 4), (5, 5), (6, 6)}. 


vi 
Then P(C) = — = 0.1428. 
49 
(e) Let 


D = the event that the total number of gates open is six 


= {(3, 3), 2, 4), (, 2), (5, 1), (1, 5), (6, 0), (0, 6)}. 


Hence, P(D) = 7/49. = 


EXERCISES 2.2 


2.2.1. Consider an experiment in which each of three cars exiting from a university main entrance 
turns right (R) or left (L). Assume that a car will turn right or left with equal probability 
of 1/2. 
(a) What is the sample space S? 
(b) What is the probability that at least one car will turn left? 
(c) What is the probability that at most one car will turn left? 
(d) What is the probability that exactly two cars will turn left? 
(e) What is the probability that all three cars will turn in the same direction? 


2.2.2. A coin is tossed three times. Define an appropriate sample space for the following cases: 
(a) The outcome of each individual toss is of interest. 
(b) Only the number of trials is of interest. 


2.2.3. A pair of six-sided balanced dice are rolled. What are the probabilities of getting the sum 
of the face values as follows? 
(a) 8 
(b) 6 or9 
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(c) 3,8, or12 
(d) Not an even number 


2.2.4. An experiment has four possible outcomes A, B, C, and D. Check whether the following 
assignments of probability are possible: 
(a) P(A) = 0.20, P(B) = 0.40, P(C) = 0.09, P(D) = 0.31. 
(b) P(A) = 0.41, P(B) = 0.17, P(C) = 0.12, P(D) = 0.36. 
(c) P(A) = 1/8, P(B) = 1/2, P(C) = 1/4, P(D) = 1/8. 


2.2.5. Suppose we toss two coins and suppose that each of the four points in the sample space 
S = {(H, H), (A, T), (T, A), (T, T)} is equally likely. Let the events be A = {(H, H), (H, T)} 
and B = {(H, H), (T, H)}. Find P(AU B). 


2.2.6. An urn contains 12 white, 5 yellow, and 13 black marbles. A marble is chosen at random 
from the urn, and it is noted that it is not one of the black marbles. What is the sample 
space in view of this knowledge? What is the probability that it is yellow? 


2.2.7. Two fair dice are rolled and face values are noted. 
(a) What is the probability space? 
(b) What is the probability that the sum of the numbers showing is 7? 
(c) What is the probability that both dice show number 2? 


2.2.8. Inacity, 65% of people drink coffee, 50% drink tea, and 25% both. What is the probability 
that a person chosen at random will drink at least one of coffee or tea? Will drink neither? 


2.2.9. Ina fruit basket, there are five mangos, of which two are spoiled. If we were to randomly 
pick two mangos: 
(a) What would be our sample space? 
(b) What is the probability that both mangos are good? 
(c) What is the probability that no more than one mango is spoiled? 


2.2.10. Ina box there are three slips of paper, with one of the letters A, C, T written on each slip. 
If the slips are drawn out of the box one at a time, what is the probability of obtaining the 
word CAT? 


2.2.11. Suppose that the genetic makeup of the population of a city is as in Table 2.2.1. 


Table 2.2.1 


Genetic makeup AA Aa _ aa 


Probability P 2q r 


An individual is considered to have the dominant characteristic if the person has the AA 
or Aa genetic trait. If we were to choose an individual from this city at random, what is the 
probability that this person has the dominant characteristic? 
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2.2.12. Using the axioms of probability, show that P(¢) = 0, and if Ai,..., An are pairwise 
mutually exclusive, then o(U Ai) = ¥° P(Aj). 
i=1 i=1 
2.2.13. Using the axioms of probability, prove the following: 
(a) If AC B, then P(A) < P(B). 
(b) P(A U B) = P(A) + P(B)— P(AN B). In particular, if ANB = ¢, then P(AUB) = 
P(A) + P(B). 


2.2.14. Using the axioms of probability, show that 


P(AU BUC) = P(A) + P(B) + P(C) — P(AN B) — P(ANC) 
— P(BNC)+ P(ANBNC) 


2.2.15. Prove that 


(a) P(AN B) = P(A) + P(B)—1 
2. 


(b) °(U Ai) < > P(Aj) 


2.2.16. If A and B are mutually exclusive events, P(A) = 0.17 and P(B) = 0.46, find 
(a) P(AUB) 
(b) P(A‘) 
(c) P(ASU B®) 
(d) P((AN B)*) 
(e) P(ACN B*) 


2.2.17. If P(A) = 0.24, P(B) = 0.67, and P(AN B) = 0.09, find 
(a) P(AUB) 
(b) P(AU B)*) 
(c) P(ACU B°) 
(d) P((AN B)*) 
(e) P(ASN B*) 


2.2.18. Ina series of seven games, the first team to win four games wins the series. If the teams are 
evenly matched, what is the probability that the team that wins the first game will win the 
series? 


2.2.19. In asurvey, 1000 adults were asked if they would approve an increase in tax if the revenues 

went to build a football stadium. It was also noted whether the person lived in a city (C), 
suburb (S), or rural area (R), of the county. The results are summarized in Table 2.2.2. 
Define the following events: 

A: person chosen is from the city 

B: person disapproves tax increase 
Find the following probabilities; 
(i) P(B), (ii) P(ASNB), and (iii) P(AU B®) 
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Table 2.2.2 


Yes (for tax increase) No (against tax increase) 


C 150 250 
S 250 150 
R 50 150 


2.2.20. A couple has two children. Suppose we know the elder child is a boy. 
(a) Determine an appropriate sample space. 
(b) Find the probability that both are boys. 

2.2.21. A box contains three red and two blue flies. Two flies are removed with replacement. Let A 
be the event that both the flies are of the same color and B be the event that at least one of 
the flies is red. Find (i) P(A), (ii) P(B), (iii) P(A U B), and (iv) P(AN B). 


2.2.22. Prove that for any n, 


n n 
°(U 4) =D5 PAID — DY PA Ai) + 
i=1 i=1 1, <i2 
+(-1)™tt ST P(A, NA 1... Aig) 


11 <i2<...<ip 


+eee+ (-1)"t! P(AY A ADN... An). 


Im 


The summation >> P(Aj,NAj,N...NA 


) is taken over all of the () subsets of 
1] <I2 <...<im 


size m from the set {1, 2,..., n}. 


2.2.23. A sequence of events {A,, > 1} is said to be an increasing sequence if Ay C Az C... C 
An C ..., whereas it is said to be decreasing if A] D Az D...D Ay D.... If {An,n = 1} is 

CO 
increasing sequence of events, then lim A, = [J An. Similarly, if {Ay,n > 1} is decreasing 

n—>0Oo i=1 

as l 
sequence of events, then lim A, = () An. Show that if {A,,m > 1} is either an increasing 
n—->Co i=1 
or a decreasing sequence of events, then lim P(A,) = P( lim An). 
noo N+ CO 


2.3 COUNTING TECHNIQUES AND CALCULATION OF 
PROBABILITIES 


In a sample space with a large number of outcomes, determining the number of outcomes associ- 
ated with the events through direct enumeration could be tedious. In this section we develop some 
counting techniques and use them in probability computations. 
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A, 
A a 
As 
B, 
B < Bp 
Bs 


W@ FIGURE 2.2 Tree diagram. 


MULTIPLICATION PRINCIPLE 


Theorem 2.3.1 If the experiments A;, A2,..., Am contain, respectively, n,,n2,..., Mm outcomes, such that 
for each possible outcomes of A, there are n2 possible outcomes for Az, and so on, then there are a total of 
N1,N2,...,Nm possible outcomes for the composite experiment A,, A2,..., Am. 


For m = 2 and n; = 2,n2 = 3, the tree diagram in Figure 2.2 illustrates the multiplication principle. 
If we count the total number of branches at the top of the tree, we get the total number of possible 
outcomes for the composite experiment. In the figure, we can see that there are total of six branches 
that represent all the possible outcomes of this experiment. The tree diagrams can be utilized for 
counting for any finite number of composite experiments. 


OC Oe 
Example 2.3.1 
In how many different ways can a student club at a large university with 500 members choose its president 
and vice president? 


Solution 
The president can be chosen 500 ways, and the vice president can be chosen from the remaining 499 ways. 
Hence, by the multiplication principle, there are (500)(499) = 249,500 ways in which the complete choice 
can be made. 

= 


When a random sample of size k is taken with replacement from a total of n objects, the total number 
of ways in which the random sample of size k can be selected depends on the particular sampling 
method we employ. Here we will consider four sampling methods: (i) sampling with replacement 
and the objects are ordered, (ii) sampling without replacement and the objects are ordered, (iii) 
sampling without replacement and the objects are not ordered, and (iv) sampling with replacement 
and the objects are not ordered. 


(Il) Sampling with Replacement and the Objects Are Ordered 


When a random sample of size k is taken with replacement from a total of n objects and the objects 
being ordered, then there are n* possible ways of selecting k-tuples. 
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For example, (1) ifa die is rolled four times, then the sample space will consist of 64 4-tuples. (2) If 
an urn contains nine balls numbered 1 to 9, and a random sample with replacement of size k = 6 is 
taken, then the sample space S will consist of 9° 6-tuples. 


(II) Sampling without Replacement and the Objects Are Ordered 


The symbol n! (read n factorial) is defined as n! = n(n — 1)... (2)(1). Clearly 1! = 1. By definition, 
we take 0! = 1. 


If r objects are chosen from a set of n distinct objects without replacement, any particular (ordered) 
arrangement of these objects is called a permutation. For example, CDAB is a permutation of the 
letters ABCD. The number of permutations of these four letters is 4! = 24, because the first position 
can be filled by any of the four letters, leaving only three possibilities for the second position, two for 
the third position, and only one for the fourth position, yielding the number of permutations to be 
4.3.2.1 = 24. 


PERMUTATION OF n OBJECTS TAKEN m AT A TIME 
Theorem 2.3.2 The number of permutations of m objects selected from a collection of n distinct objects is 


n! 


Pn = ———— 
nm” (n—my)! 


=n(n—1)(n—2)...(n—m+1). 


When a random sample of size k is taken without replacement from a total of n objects and the 
objects being ordered, we will apply the permutation formula. 


[2 
Example 2.3.2 
How many distinct three-digit numbers can be formed using the digits 2, 4, 6, and 8 if no digit can be 
repeated? 


Solution 

The number of distinct three-digit numbers will be the number of permutations of three numbers from the 

set of four numbers {2, 4, 6, 8}. Hence the number of distinct three-digit numbers will be 4 P3 = 4!/1! = 24. 
= 


(III) Sampling without Replacement and the Objects Are Not Ordered 


Note that in a permutation, the order in which each object is selected becomes important. When the 
order of arrangement is not important—for example, if we do not distinguish between AB and BA—the 
arrangement is called a combination. We give the following result for number of combinations. 


NUMBER OF COMBINATIONS OF n OBJECTS TAKEN m AT A TIME 
Theorem 2.3.3 The number of ways in which m objects can be selected (without replacement) from a 
collection of n distinct objects is 
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n\ _ n! 
ies) ~ m!(n —m)! 


n(n—1)(n—2)...(21—m+1) 
m! , 


The symbol (”") is to be read as “n choose m.” When a random sample of size k is taken without 
replacement from a total of n objects and the objects are not ordered, we will apply combinations 


formula. 


eee 
Example 2.3.3 
How many different ways can the admissions committee of a statistics department choose four foreign 
graduate students from 20 foreign applicants and three U.S. students from 10 U.S. applicants? 


Solution 

The four foreign students can be chosen in oy ways, and the three U.S. students can be chosen in (a) 

ways. Now, by the multiplication principle, the whole selection can be made in a) = 581,400 ways. 
= 


(IV) Sampling with Replacement and the Objects Are Not Ordered 


In obtaining an unordered sample of size k, with replacement, from a total of n objects, k — 1 

replacements will be made before sampling ceases. Thus n is increased by k — 1 so that sampling in 

this manner may be thought of as drawing an unordered sample of size k from a population of size 

n+k—1. Hence, the number of possible samples can be obtained by using the formula 

es ' oe 
k ki(n — 1)! 


| oO ooo 
Example 2.3.4 
An urn contains 15 balls numbered 1 to 15. If four balls are drawn at random, with replacement and without 
regard for order, how many samples are possible? 


Solution 
Using the previous formula, the number of possible samples is 


i4a<4 18! 
te = = 3060. 
4 4114! 


If we need to divide n objects into more than two groups, we can use the following result. 
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NUMBER OF COMBINATIONS OF n OBJECTS INTO m CLASSES 
Theorem 2.3.4 The number of ways that n objects can be grouped into m classes with n; in the ith class, 


m 
i=1,2,...,mand \° nj =n is given by 


j=l 
n n! 
njn2...Nm ~ ny!no!... nm! 


In the foregoing theorem, the numbers ( ) are called multinomial coefficients. 
nyjyn2...Nm 


We can use the previous computational technique to compute the probabilities of events of interest 
by using frequency interpretation of probability. Suppose that there are a total of N possible outcomes 
for the experiment and let n4 be the number of outcomes favoring an event A. Then the probability 
of this event is P(A) = n4/N. The following is a well-known problem that is called the birthday 
problem. 


-—$__——_ >A SANS $a 


Example 2.3.5 
In a room there are n people. What is the probability that at least two of them have a common birthday? 


Solution 

Disregarding the leap years, assume that every day of the year is equally likely to be a birthday. Let A be 
the event that there are at least two people with a common birthday. There are 365” possible outcomes of 
which A® can happen in 365 x 364 x (365 —n+1) ways. Because the event A can happen in many more 
ways, it is easier to calculate P(A‘), that is, the probability that no two persons have the same birthday 
or equivalently that they all have different birthdays. To count the number of n-tuples in A°, because there 
are no common birthdays, we can use the method of choosing distinct objects without replacement for an 
ordered arrangement. Thus there are 365 possibilities to choose the first person, 364 for the second person, 
..., (365 — (n— 1)) possibilities for the nth person. The product of these numbers gives the total number of 
elements in A®. Thus 


365 x 364x...x (365 -—n+1) 


Cc) — 
P(A‘) = 365” 


and hence 


365 x 364x...x (365 -—n+1) 


P(A)=1 
- 365” 


For example, ifn = 3, P(A) = 1— ce coo = 0.0082, and if n = 40, 


(365)(364)...(365—- 4041) | 


P(A) =1 (365)40 


1 — 0.891 = 0.109. 
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That is, there is only a 0.82% chance of having a common birthday among three persons, whereas if n = 40, 
then P(A) = 0.109— that is, the chance of having a common birthday among 40 persons increases to 
10.9%. Thus, as the number of persons increases, the chance of finding people with common birthdays also 
increases. 

= 


cc 


Example 2.3.6 
In a tank containing 10 fishes, there are three yellow and seven black fishes. We select three fishes at 
random. 


(a) What is the probability that exactly one yellow fish gets selected? 
(b) What is the probability that at most one yellow fish gets selected? 
(c) What is the probability that at least one yellow fish gets selected? 


Solution 
Let A be the event that exactly one yellow fish gets selected, and B be the event that at most one yellow 
fish gets selected. There are 3) = 120 ways to select three fishes from 10. 

(a) There are (3) = 3 ways to select a yellow fish and (5) = 21 ways to select two black fishes. By 


multiplication rule, the probability of selecting exactly one yellow fish is 


(b) The probability that at most one yellow fish gets selected is the same as the probability of selecting 


none or one, which is 
3\/7 3\(7 
1/\2 O}\3 
+ 
10 
3 
(c) The probability that at least one yellow fish gets selected is the same as 1 — P(none), which is 
1 — 0.292 = 0.708. 


= 0.525 + 0.292 = 0.817. 
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Example 2.3.7 
Refer to Example 2.3.3. Suppose that the admission committee decides to randomly choose seven graduate 
students from a pool of 30 applicants, of whom 20 are foreign and 10 are U.S. applicants. What is the 
probability that a chosen seven will have four foreign students and three U.S. students? 
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Solution 
As in Example 2.3.3, the number of ways of selecting four foreign and three U.S. students is 


2 
( (9) = 581,400. 
4 }\3 


The number of ways of selecting seven applicants out of 30 is 


(*) = 2,035,800. 


Hence the probability that a randomly selected group of seven will consist of four foreign and three U.S. 


students is 
20\/ 10 
4}\3 581,400 


= = 0.2856. 


30\  —«2,035,800 
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EXERCISES 2.3 


2.3.1. Determine the following: 


(i) (2). (ii) @, (iii) () (iv) Cale) and (v) ( ; a a 


2.3.2. A game ina state lottery selects four numbers from a set of numbers, {0,1,2,3,4,5,6,7,8,9}, 
with no number being repeated. How many possible groups of four numbers are possible? 


2.3.3. A 10-bit binary word is a sequence of 10 digits, of which each may be either a 1 or a 0. How 
many 10-bit words are there? 


2.3.4. Insulin, a peptide hormone built from 51 amino acid residues, is one of the smallest proteins 
known (note that proteins are made up of chains of amino acids) with a molecular weight 
of 5808 Da. Twenty amino acids are encoded by the standard genetic code, that is, proteins 
are built from a basic set of 20 amino acids. How many possible proteins of length 51 can 
be made with 20 amino acids for each position in the protein? 


2.3.5. An examination is designed where the students are required to answer any 20 questions 
from a group of 25 questions. How many ways can a student choose the 20 questions? 


2.3.6. How many different six-place license plates are possible if the first three places and the last 
place are to be occupied by letters and the fourth and fifth places are to be occupied by 
numbers? 


2.3.7. In how many different ways can 15 tickets to a football game be distributed among a class 
of 30 students if each student gets at most one ticket? 
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2.3.8. How many different four-letter English words (with or without meaning) can be written 
using distinct letters from the alphabet? 


2.3.9. DNA (deoxyribonucleic acid) is made from a sequence of four nucleotides (A, T, G, or C). 
Suppose a region of DNA is 40 nucleotides long. How many possible nucleotide sequences 
are there in this region of DNA? 


2.3.10. Show that 
 (3)=(1)- 
 ()=(a)C arenes 
0 (3-24) 


2.3.11. A lot of 50 electrical components numbered 1 to 50 is drawn at random, one by one, and is 
divided among five customers. 
(a) Suppose that it is known that components 3, 18, 12, 26, and 46 are defective. What is 
the probability that each customer will receive one defective component? 
(b) What is the probability that one customer will have drawn five defective components? 
(c) What is the probability that two customers will receive two defective components each, 
two none, and the other one? 


2.3.12. A package of 15 apples contains two defective apples. Four apples are selected at random. 
(a) Find the probability that none of the selected apples is defective. 
(b) Find the probability that at least one of the selected apples is defective. 


2.3.13. A homeowner wants to repaint her home and install new carpets (no store where she live 
sells both paint and carpet). She plans to get the services from the stores where she buys 
the paint and carpet. Suppose there are 12 paint stores with painting service available and 
15 carpet stores with installation services available in that city. In how many ways can she 
choose these two stores? 


2.3.14. From an urn containing 15 white, 7 black, and 8 yellow balls a sample of 3 balls is drawn 
at random. Find the probability that 
(a) All three balls are yellow. 
(b) All three balls are of the same color. 
(c) All three balls are of different colors. 


2.3.15. Refer to Example 2.3.5. Compute (A) for (a) 1 = 20, (b) n = 30. Estimate n if you wish to 
have an approximately 50% chance of finding someone who shares your birthday. 


2.3.16. A box of manufactured items contains 12 items, of which four are defective. If three items 
are drawn at random without replacement, what is the probability that 
(a) The first one is defective and the rest are good? 
(b) Exactly one of the three is defective? 
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2.3.17. Five white and four black balls are arranged in a row. What is the probability that the end 
balls are of different colors? 


2.3.18. Three numbers are chosen at random from the numbers {1, 2, ..., 9}. What is the probability 
that the middle number is 5? 


2.3.19. In each of the following, find the number of elements in the resulting sample space. 

(a) Ifa die is rolled five times, how many elements are there in the sample space? 

(b) If 13 cards are selected from a deck of 52 playing cards without replacement, and the 
order in which the cards are drawn is important, how many elements are there in the 
sample space? 

(c) Four players in a game of bridge are dealt 13 cards each from an ordinary deck of 52 
cards. What is the total number of ways in which we can deal the 13 cards to the four 
players? 

If a football squad consists of 72 players, how many selections of 11-man teams are 
possible? 


(d 
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2.3.20. In Florida Lotto, an urn contains balls numbered 1 to 53. From this urn, a machine chooses 
six balls at random and without replacement. The order in which the balls are selected does 
not matter. For a $1 bet, a player chooses six numbers. If all six numbers match with the six 
numbers chosen by the urn, the player wins the jackpot. What is the probability of winning 
the Florida Lotto jackpot? 


2.3.21. The cells in our bodies receive half of their chromosomes from the father and the other 
half from the mother. So for each pair of homologous chromosomes one will be a paternal 
chromosome and one will be a maternal chromosome. We have 23 pairs of homologous 
chromosomes. 

(a) How many possible combinations of paternal and maternal chromosomes are there? 
(b) What is the probability of getting a gamete with nine paternal and 14 maternal 
chromosomes? Assume that any ordered combination is equally likely. 


2.4 THE CONDITIONAL PROBABILITY, INDEPENDENCE, AND BAYES’ RULE 


If we know that an event has already occurred or we have some partial information about the event, 
then this knowledge may affect the probability of the event of interest. For example, if we were to 
guess on the probability of rain today, the answers will be different depending on whether we are 
sitting inside a windowless office or we are outside and can see the formation of heavy clouds. This 
leads to the idea of conditional probability. 


Definition 2.4.1 The conditional probability of an event A, given that an event B has occurred, denoted 
by P(A|B), is equal to 


P(AMB) 


P(A|B) = eT 


provided P(B) > 0. 
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Example 2.4.1 
We toss two balanced dice, and let A be the event that the sum of the face values of two dice is 8, and B 
be the event that the face value of the first one is 3. Calculate P(A|B). 


Solution 
The elements of the events A and B are 


A = {(2, 6), (6, 2), (3, 5), (5, 3), (4, 4)}. 


and 
B= {(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6)}. 
Now AN B= {(3, 5)} 
P(A) = 5/36, P(B) = 6/36, and P(AN B) = 1/36. 


Therefore, 
1 
P(ANB) «& 1 
P(A|B) = = 28 =_., 
P(B) 36 6 


It is important to note that the conditional probability P(.|B), is a probability on B. It satisfies all the 
axioms of a probability. 


SOME PROPERTIES OF CONDITIONAL PROBABILITY 
1. If Ey C Ey, then P(E2|A) < P(E) |A). 
2. P(E|A) = 1 — P(E‘|A). 
3. P(E, U E2|A) = P(E, |A) + P(E2|A) — P(E,  E2|A). 
4. Multiplication law: P(A MN B) = P(B)P(A|B) = P(A)P(BIA). 
In general, 


P(A, NA2N...AAn) =P(A1)P(A2|A1)P(A3|A1 N Az)... 
P(An|A, NA2N...AAp—1). 


Err ———————_—_—__—_—_—_—_—_—_—_————_—_—_—————————_—_——— 


Example 2.4.2 
A fruit basket contains 25 apples and oranges, of which 20 are apples. If two fruits are randomly picked in 
sequence, what is the probability that both the fruits are apples? 


Solution 
Let 


A = {event that the first fruit is an apple} 


B = {event that the second fruit is an apple}. 
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We need to find P(AN B). We have 
P(A) = 20/25, P(B\A) = 19/24. 
Now using the multiplication principle for conditional probabilities, 


20\ / 19 
P(AN B) = P(A)P(BA) = (3 )(z) = 0.633. 


Hence the probability that both the fruits are apples is 0.633. 
[zai 


Probability and statistics are proving to be very useful in the field of genetics. Genetics is the study 
of heredity—traits transmitted from parent to offspring. The starting point of the subject of genetics 
as presently known can be attributed to Gregor Mendel (1822-1884), an Austrian monk. During the 
1850s Mendel was interested in plant breeding. He performed careful experiments with the garden 
pea, Pisum sativum, and uncovered the basic principles of genetic inheritance. Mendel discovered that 
traits are inherited in discrete units (known as genes). Mendel’s law of independent segregation states 
that the parent transmits randomly one of its traits to the offspring. Geneticists use letters to represent 
alleles. A capital letter is used to represent a dominant trait, and a lowercase letter is used to represent a 
recessive trait. A dominant allele can be observed in the organism’s appearance or physiology, whereas 
a recessive allele cannot be observed unless the individual has two copies of the recessive allele. 


ee 


Example 2.4.3 
Suppose we are given a population with the following genetic distribution: 


Genetic makeup | AA | Aa | aa 
Probability p|2q)r 


Alleles are randomly donated from parents to offspring. Assuming random mating, what is the probability 
that the mating is Aa x Aa and the offspring is aa (recessive trait)? 


Solution 

Let B denote the event that the mating is Aa x Aa, and C denote the event that the offspring is aa. Then we 
have P(B) = Aq. Because the alleles are randomly donated from parents to offspring, P(C|B) = i: Now, 
using the multiplication principle for conditional probabilities, 


2a(1 2 
P(BNC) = P(B)P(C|B) = (4q (3) = 


Hence the probability that the mating is Aa x Aa and the offspring is of the recessive trait is q?. 
[=e 


In order to compute probabilities similar to that in Example 2.4.3, we could use Table 2.1. The 
distributions of the progeny (zygotes) are the predicted values from Mendel’s law. 
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Table 2.1 The Distribution of Zygotes 

Mating Probability Probability of zygotes (offspring) 
of mating AA Aa ma 

AAx AA p 1 0 0 

AA x Aa 2pq 1/2 1/2 0 

AA x aa pr 0 1 0 

Aa x Aa 4q° 1/4 1/2 1/4 

Aa Xx aa 2qr 0 1/2 1/2 

aa Xx aa r? 0 0 1 


If the occurrence of one event has no effect on the occurrence of another event, then those two events 
are said to be independent of each other. Thus we have the following definition. 


Definition 2.4.2 Two events A and B with P(A) # 0 and P(B) # 0 are said to be independent if 
P(A|B) = P(A), or P(B|A) = P(B). Otherwise, A and B are dependent. 


As a consequence of the foregoing definition, two events A and B are independent if and only if 
P(A B) = P(A)P(B) and at least one of P(A) or P(B) is not zero. An alternative definition of 
independence of two events A and B can be based on this equality. That is, two events A and B are 
said to be independent if 


P(AN B) = P(A) P(B) 


In this case it is not necessary to assume that at least one of P(A) or P(B) is not zero. 
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Example 2.4.4 
Suppose that we toss two fair dice. Let E; denote the event that the sum of the dice is 6 and Ey denote the 
event that the first die equals 4. Then, P(E, Ez) = P({4, 2}) = 1/36 # P(E})P(E2) = 5/216. Hence 
E, and Ep are dependent events. 
[t= 


Definition 2.4.3 The k events A,, Az,..., Ag are mutually independent if for every j = 2,3,...,k 
and every subset of distinct indices i,,i2,..., 1; 

P(Ai, NAL, N...0 Ai;) = P(Aj,) P(Ai;)--- P(Ai,) 
Mutually independent events will often be called independent. In particular, if P(Aj,;  Ai,) = P(Ai;) P(Ai,) 
for each j # k, then the events are called pairwise independent. 


Now we will discuss computation of the probability P(A; |B) (called posterior probability) from the 
given prior probabilities P(A;) and conditional probabilities P(B|A;). First we will state the total 
probability rule. 


2.4 The Conditional Probability, Independence, and Bayes’Rule 75 


LAW OF TOTAL PROBABILITY 
Theorem 2.4.1 Assume S = A; UAjU...U An, where P(A;) > 0,i=1,2,...,n, and Aj Aj = ¢ (null 
set) fori # j. Then for any event B, 


P(B) = )) P(A) P(BIAi). 


il 


The set Aj, Az,..., An given in Theorem 2.4.1 is called the partition of S. 


$$ a 
Example 2.4.5 
Assume that a noisy channel independently transmits symbols, say 0s 60% of the time and 1s 40% of 
the time. At the receiver, there is a 1% chance of obtaining any particular symbol distorted. What is the 
probability of receiving a 1, irrespective of which symbol is transmitted? 


Solution 
Given 


P(0) = P(‘0' is transmitted) = 0.6 
and 
P(1) = P(‘1' is transmitted) = 0.4. 
Also, given that the probability that a particular symbol is distorted is 0.01; that is, 
P(1|0O) = P(1 is received|0 is transmitted) 
= 0.01 = P(O|1) = P(0 is received|1 is transmitted). 


Hence, from the total probability rule, the probability of receiving a zero is 


P(1) = P(receive a 1) = P(1|0) P(O) + P(1|1) P(1) 
= (0.01)(0.6) + (0.99)(0.4) = 0.402. 


Hence, irrespective of whether a 0 or 1 is transmitted, the probability of receiving a 1 is 0.402. 
| 


Cl oo Qo@@Q@QwX« 
Example 2.4.6 
During an epidemic in a town, 40% of its inhabitants became sick. Of any 100 sick persons, 10 will need to 
be admitted to an emergency ward. What is the probability that a randomly chosen person from this town 
will be admitted to an emergency ward? 


Solution 
Let 


A = {the person is healthy} 
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and 
B = {the person is admitted to an emergency ward} 
It is given 
P(A‘) = 0.4. 
Hence, 
P(A) = 0.6. 


We want to find P(B). Now P(B|A) = 0, because a healthy person will not be admitted to an emergency 
ward. Also, 


= 10) 
P(BIA‘) = 369 = 0.1. 
Hence, by the total probability rule, 


P(B) = P(A)P(BIA) + P(A‘) P(BIA) 
= (0.6)(0) + (0.1)(0.4) = 0.04. 
P 


Sometimes it is not possible to directly calculate the conditional probability that is needed but other 
probabilities related to the probability in question are available. Bayes’ rule shows how probabilities 
change in the light of information and how to calculate them. It is also an essential tool in the 
Bayesian inference. Bayes’ theorem is named after an English clergyman, Reverend Thomas Bayes, 
who outlined the result in a paper published (posthumously) in 1763. This is one of those results 
that we can prove relatively easily. However, the implications of this result are profound in statistics 
and many other applied fields; see Chapter 11. 


BAYES’ RULE 
Theorem 2.4.2 Assume S = Aj UA2U...U An, where P(A;) > 0,i=1,2,...,n and A; 0 Aj = ¢ for 
i # j. Then for any event B, with P(B) > 0 

P(Aj) P(BAj) 

= : 


D PCA PCB Aj) 


iil 


P(A; |B) = 
Proof. We have 


P(A; B) 


=> , by total probability rule for P(B) 
3 P(A) P(B|4;) 
i=1 
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P(Aj) P(B|Aj) 


» P(Aj) P(BIA;) 
i=1 


In Bayes’ theorem, the probabilities P(A;) are called the prior or a priori probabilities of the events 
A; and the conditional probability P(A; |B) is called the posterior probability of the event Aj;. The 
events Aj,..... , An are sometimes called the states of nature. 
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Example 2.4.7 
Suppose a statistics class contains 70% male and 30% female students. It is known that in a test, 5% of males 
and 10% of females got an “A” grade. If one student from this class is randomly selected and observed to 
have an “A” grade, what is the probability that this is a male student? 


Solution 

Let Aj denote that the selected student is a male, and A denote that the selected student is a female. Here 
the sample space S = A, U Ag. Let D denote that the selected student has an “A” grade. We are given 
P(A)1) = 0.7, P(A2) = 0.3, P(D|A,1 ) = 0.05, and P(D |A2) = 0.10. Then by the total probability rule, 


P(D) = P(Aj) P(D|A1) + P(A2) P(D|A2) 


= 0.035 + 0.030 = 0.065. 
Now by Bayes’ rule, 


P(A1) P(DIA1) 


P(A, |D)= P(Aj) P(D|A,) + P(A2) P(D|A2) 


_ (0.7)(0.05) _ 7 


= — = 0.538. 
(0.065) 13 


This shows that even though the probability of a male student getting an “A” grade is smaller than that for 
a female student, because of the larger number of male students in the class, a male student with an “A” 
grade has a greater probability of being selected than a female student with an “A” grade. 

= 


STEPS TO APPLY BAYES’ RULE 
To find P(A;|D): 
1. List all the probabilities including conditional probabilities given in the problem. That is 
P(A), ...,P(An) and P(D|A}), ...,P(D|An). 
2. Write the numerator as the product, P(A; )P(D|A}). 
3. Using total probability rule, find the denominator probability in the Bayes’ rule. 
4, The desired probability is Numerator 
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Example 2.4.8 
Suppose that three types of antimissile defense systems are being tested. From the design point of view, 
each of these systems has an equally likely chance of detecting and destroying an incoming missile within a 
range of 250 miles with a speed ranging up to nine times the speed of sound. However, in actual practice it 
has been observed that the precisions of these antimissile systems are not the same; that is, the first system 
will usually detect and destroy the target 10 of 12 times, the second will detect and destroy it 9 of 12 times, 
and the third will detect and destroy it 8 of 12 times. We have observed that a target has been detected 
and destroyed. What is the probability that the antimissile defense system was of the third type? 


Solution 

Let $1, Sz, and S3 be the events that the first, second, and third antimissile defense systems, respectively, are 
used. Also let D be the event that the target has been detected and destroyed. We wish to find P(S3|D). 
Given that P(S,) = P(S2) = P(S3) = 1/3, P(D|S1) = 10/12, P(D|S2) = 9/12, and P(D|S3) = 8/12. 
By total probability rule, 


P(D) = P(S1)P(D|S1) + P(S2) P(D|S2) + P(S3) P(DIS3) 


(2) VDQS) =o 


Now using the Bayes formula, we have 


P(S3) P(D|S3) _ (1/3)(8/12) _ 8 


PIS3|D) = P(D) O95. ae 


= 0.2963. 


If the target is destroyed, then the probability that the antimissile defense system was of the third type is 
0.2963. 
= 


EXERCISES 2.4 


2.4.1. Consider the portion of an electric circuit with three relays shown in Figure 2.3. Current 
will flow from point a to point b if at least one of the relays closes properly when activated. 


mm FIGURE 2.3 
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The relays may malfunction and not close properly when activated. Suppose that the relays 

act independently of one another and close properly when activated with probability 0.9. 

(a) What is the probability that current will flow when the relays are activated? 

(b) Given that current flowed when the relays were activated, what is the probability that 
relay 1 functioned? 


2.4.2. If P(A) > 0, P(B) > Oand P(A) < P(A|B), show that P(B) < P(BIA). 


2.4.3. If P(B) > 0, 
(a) Show that P(A|B) + P(A‘|B) = 1. 
(b) Show that in general the following two statements are false: (i) P(A|B) + P(A|B*) = 1, 
(ii) P(A|B) + P(A‘|B°) = 1. 


2.4.4. If P(B) = p, P(A°|B) = q,and P(ACM B*) = r, find (a) P(AN B®), (b) P(A), and (c) P(BIA). 


2.4.5. If A and B are independent, show that so are (i) A° and B, (ii) A and B°, and (iii) A‘ 
and B°. 


2.4.6. Show that two events A and B are independent if and only if P(A NM B) = P(A) P(B) when 
at least one of P(A) or P(B) is not zero. 


2.4.7. A card is elected at random from an ordinary deck of 52 playing cards. If E is the event 
that the selected card is an ace and F is the event that it is a spade, show that E and F are 
independent events. 


2.4.8. A fruit basket contains 30 apples, of which five are bad. If you pick two apples at random, 
what is the probability that both are good apples? 


2.4.9. Two students are to be selected at random from a class with 10 girls and 12 boys. What is 
the probability that both will be girls? 


2.4.10. Assume a population with the genetic distribution given in Example 2.4.3. Assume random 
mating. What is the probability that an offspring is aa? 


2.4.11. One of the most common forms of colorblindness is a sex-linked hereditary condition 
caused by a defect on the X chromosome (one of the two chromosomes that determine 
gender). It is known that colorblindness is much more prevalent in males than in females. 
Suppose that 6% of males are colorblind but only 0.75% of females are colorblind. In a 
certain population, 45% are male and 55% are female. A person is randomly selected from 
this population. 

(a) Find the probability that the person is colorblind. 
(b) Find the probability that the person is colorblind given that the person is a male. 


2.4.12. A survey asked a group of 400 people whether or not they were doing daily exercise. The 
responses by sex and physical activity are as in Table 2.4.1. 
A person is randomly selected. 
(a) What is the probability that this person is doing daily exercise? 
(b) What is the probability that this person is doing daily exercise if we know that this 
person is a male? 
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Table 2.4.1 
Male Female 


Daily exercise 50 61 


No daily exercise = 177 112 


2.4.13. A laboratory blood test is 98% effective in detecting a certain disease if the person has 
the disease (sensitivity). However, the test also yields a “false positive” result for 0.5% of 
the healthy persons tested. (That is, if a healthy person is tested, then, with probability 
0.005, the test result will show positive.) Assume that 2% of the population actually has 
this disease (prevalence). What is the probability a person has the disease given that the 
test result is positive? 


2.4.14. In order to evaluate the rate of error experienced in reading chest x-rays, the following 
experiment is done. Several people with known tuberculosis (TB) status (through other 
reliable tests) are subjected to chest x-rays. A technician who is unaware of this status reads 
the x-ray, and Table 2.4.2 gives the result. Here +x-ray means the technician concluded that 
the person has TB. 


Table 2.4.2 


Person without TB’ Person with TB Total 


+X-ray 70 27 97 
—X-ray 1883 20 1903 
Total 1945 55 2000 


Find (a) P(TB| + X — ray), (b) P(+X — ray|No TB), and (c) P(No TB| — X — ray). 


2.4.15. Each of the 12 ordered boxes contains 12 coins, consisting of pennies and dimes. The 
number of dimes in each box is equal to its order among the boxes, that is, box number 1 
contains one dime and 11 pennies, box number 2 contains two dimes and 10 pennies, etc. 
A pair of fair dice is tossed, and the total showing indicates which box is chosen to have a 
coin selected at random from it. 

(a) Find the probability that a coin selected is a dime. 
(b) It is observed that the selected coin is a penny. Find the probability that it came from 
box number 4. 

2.4.16. Of 600 car parts produced, it is known that 350 are produced in one plant, 150 parts in 
a second plant, and 100 parts in a third plant. Also it is known that the probabilities are 
0.15, 0.2, and 0.25 that the parts will be defective if they are produced in the first, second, 
or third plants, respectively. What is the probability that a randomly picked part from this 
batch is not defective? 


2.4.17. One class contains 5 girls and 10 boys and a second class contains 13 boys and 12 girls. 
A student is randomly picked from the second class and transferred to the first one. After 
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that, a student is randomly chosen from the first class. What is the probability that this 
student is a boy? 


2.4.18. Consider that we have in an industrial complex two large boxes, each of which contains 30 
electrical components. It is known that the first box contains 26 operable and 4 nonoperable 
components and that the second box contains 28 operable and 2 nonoperable components. 
Assume that the probability of making a selection from each of the boxes is the same. 

(a) Find the probability that a component selected at random will be operable. 
(b) Suppose the component chosen at random is operable. Find the probability that the 
component was chosen from box 1. 

2.4.19. Urn 1 contains five white balls and three red balls. Urn 2 contains four white and six red 
balls. An urn is selected at random, and a ball is drawn at random from that urn. Find the 
probability that, if the ball selected is white, it came from urn 1. 


2.4.20. An urn contains two white balls and two black balls. A number is randomly chosen from 
the set {1, 2, 3, 4}, and many balls are removed from the urn. Find the probability that the 
number i, i = 1, 2, 3, 4, was chosen if at least one white ball was removed from the urn. 


2.4.21. Acertain state groups its licensed drivers according to age into the following categories: (1) 
16 to 25; (2) 26 to 45; (3) 46 to 65; and (4) over 65. Table 2.4.3 lists, for each group, the 
proportion of licensed drivers who belong to the group and the proportion of drivers in 
the group who had accidents. 


Table 2.4.3 


Group Size Accident rate 


1 0.250 0.086 
2 0.257 0.044 
3 0.347 0.056 
4 0.146 0.098 


(a) What proportion of licensed drivers had an accident? 
(b) What proportion of those licensed drivers who had an accident were over 65? 


2.4.22. It is known that a rare disease, K, is present only in 0.2% of the population. Performance 
of the test by a physician’s diagnostic test for the presence or absence of the disease K is 
given in Table 2.4.4, where Rt denotes the positive test result, and R~ denotes the negative 
result. Also, K° denotes absence of the disease. 


Table 2.4.4 
Rt R- 
K 0.98 0.02 


Ke 0.01 0.99 
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(a) What is the probability that a patient has the disease, if the test result is positive? 
(b) What is the probability that a patient has the disease, if the test result is negative? 


2.4.23. A store has light bulbs from two suppliers, 1 and 2. The chance of supplier 1 delivering 
defective bulbs is 10%, whereas supplier 2 has a defective rate of 3%. Suppose 60% of the 
current supply of light bulbs came from supplier 1. If one of these bulbs is taken from the 
current supply and observed to be defective, find the probability that it came from supplier 2. 


2.4.24. The quality control chart of a certain manufacturing company shows that 45% of the 

defective parts produced in the company are due to mechanical errors and 55% were caused 

by human error. The defective parts caused by mechanical errors can be detected, with 95% 

accuracy rate, at an inspection station. The detection rate is only 80% if the defective parts 

are due to human error. 

(a) Suppose a defective part was detected at the inspection station. What is the probability 
that this defective part is due to human error? 

(b) Suppose that a customer returned a defective part that went undetected at the inspection 
station. What is the probability that the defective part is due to human error? 


2.4.25. A circuit has three major components: A, B, and C. Component A operates independently 
of B and C. The components B and C are interdependent. It is known that the component 
A works properly 85% of the time; component B, 90% of the time; and component C, 
95% of the time. However, if component C fails, there is a 75% chance that B will also 
fail. Assume that at least two parts must operate for the circuit to function. What is the 
probability that the circuit will function properly? 


2.4.26. Suppose that the data in Table 2.4.5 represent approximate distribution of blood type 
frequency in percentage of total population. 


Table 2.4.5 


Bloodtype O A_B_ AB 
Frequency (%) 45 40 10 5 


Assume that the blood types are distributed the same in both male and female populations. 

Also assume that the blood types are independent of marriage. 

(a) What is the probability that in a randomly chosen couple the wife has type B blood 
and the husband has type O blood? 

(b) It is known that a person with type B blood can safely receive transfusions only from 
persons with type B or type O blood. What is the probability a husband has type B or 
type O blood? It is given that a woman has type B blood, what is the probability that 
her husband is an acceptable donor for her? 


2.4.27. Suppose that there are 40 students in a statistics class and their blood type follows the 
percentage distribution given in Exercise 2.4.26. 
(a) If we randomly select two students from this class, what is the probability that both 
will have the same blood type? 
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(b) If we randomly select two students from this class and it is observed that the first 
student's blood type is B*, what is the probability that the second student's blood 
type is OT? 


2.4.28. Arare nonlethal disease (ND) that develops during adolescence is believed to be associated 

with a certain recessive genotype (aa) at a certain locus. It is known that in a population 

5% of adults have the disease. Suppose that among the adults with the disease ND, 85% 

have the aa genotype. Also suppose that among the adults without the disease, 2% of them 

have the aa genotype. We have randomly selected an adult from this population, 

(a) What is the probability that this person has the disease but not the aa genome type? 

(b) What is the probability that this person has the aa genome type the but not the disease 
ND? 

(c) Given that this person has the aa genotype, what is the probability that this person 
has the disease ND? 


2.6.29. (The gambler’s ruin problem) Two gamblers, A and B, bet on the outcomes of successive 
flips of a coin. On each flip, if the coin comes up heads, A collects from B one unit, whereas 
if it comes up tails, A pays to B one unit. They continue to do this until one of them runs 
out of money. If it is assumed that the successive flips of the coin are independent and each 
flip results in a head ith probability p, what is the probability that A winds up with all the 
money if A starts with i units and B starts with N — i units? 


2.5 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 


An experiment may contain numerous characteristics that can be measured. However in most cases, 
an experimenter will focus on some specific characteristics of the experiment. For example, a traffic 
engineer may focus on the number of vehicles traveling on a certain road or in a certain direction 
rather than the brand of vehicles or number of passengers in each vehicle. In general, each outcome 
of an experiment can be associated with a number by specifying a rule of association. The concept of 
a random variable allows us to pass from the experimental outcomes to a numerical function of the 
outcomes, often simplifying the sample space. 


Definition 2.5.1 A random variable (r.v.) X is a function defined on a sample space, S, that associates a 
real number, X(w) = x, with each outcome w in S. 


= Wo nNM?:°0O06b60 = ns nO Os 
Example 2.5.1 
Two balanced coins are tossed and face values are noted. Then the sample space S = { HH, HT, TH, TT}. 
Define the random variable X(w) = n, where n is the number of heads and w represents a simple event 
such as HH. Then 


0, ifw=(IT) 
X(o) = 41, if {HT,TH} 
2, ifw=(HH). 


It can be noted that X(@) = 0 or 2 with probability 1/4 (w.p. 1/4) and X(w) = 1 w.p. 1/2 
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It is important to note that in the definition of a random variable, probability plays no role. However, 
as evidenced by the previous example, for each value or a set of values of the random variable, there 
are underlying collections of events, and through these events one connects the values of random 
variables with probability measures. 


The random variable is represented by a capital letter (X, Y, Z, .. .), and any particular real value of the 
random variable is denoted by the corresponding lowercase letter (x, y, z,...). We define two types 
of random variables, discrete and continuous. In this book, we will not deal with mixed random 
variables. 


Definition 2.5.2 A random variable X is said to be discrete if it can assume only a finite or countably 
infinite number of distinct values. 


Suppose an Internet business firm had 1000 hits on a particular day. Let the random variable X be 


defined as the number of sales resulted on that day. Then, X can take values 0, 1,..., 1000. If we are 
to define a random variable as the number of telephone calls made from a large city on any given 
day, for all practical purposes, this can be assumed to take values 0, 1,..., 00. 


eee 

Example 2.5.2 
In the tossing of three fair coins, let the random variable X be defined as X = number of tails. Then X can 
assume values 0, 1, 2, and 3. We can associate these values with probabilities in the following way: 

P(X = 0) = P({H, A, H}) = 1/8 

P(X =1)= PCH, H,T}U{H,T, H} U{T, A, H}) = 3/8 

P(X = 2)= PCT T, H} UT, A, T}U (A, T, T}) = 3/8 

P(X = 3) = PCT, T, T}) = 1/8. 


We can write this in the tabular form 


« |e: 1) 273 
p(x) | 1/8 | 3/8 | 3/8 | 1/8 


Let X be a discrete random variable assuming values x1, x2, x3, .... We have the following. 


Definition 2.5.3 The discrete probability mass function (pmf) of a discrete random variable X is the 
function 

px) =PX=x), 6=1,.2,3,..5 
A probability mass function (pmf) is more simply called a probability function (pf). 


The cumulative distribution function (cdf) F of the random variable X is defined by 
F(x) = P(X < x) 


= >» P(y), for-w<x<o. 


all y<x 
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A cumulative distribution function is also called a probability distribution function or simply the 
distribution function. 
= 


The probability function p(x) is nonnegative. In addition, because X must take on one of the values 
in (x1, x2, x3 ...}, we have )°°°, p(x;) = 1. Although the pmf p(a) is defined only for a set of discrete 
values x1, x2, x3..., the cdf F(x) is defined for all real values x of X. 


SS—... lO SSES==-—_—_—_—_o@oOouweii 
Example 2.5.3 
Suppose that a fair coin is tossed twice so that the sample space is S = {HH, HT, TH, TT}. Let X be number 
of heads. 
(a) Find the probability function for X. 
(b) Find the cumulative distribution function of X. 
Solution 
(a) We have 
1/4 = P(HH}) = P({HT}) = P({TH}) = P({TT}). 
Hence, the pmf is given by 
p(O) = P(X = 0) = 1/4, p(Q) = 1/2, p(2) = 1/4. 
(b) For example, 
F(1.5) = P(X < 1.5) = P(X = Oor 1) 


= P(X =0)+ P(X=1) 


_ il 4 13 
“4°27 4 
Proceeding similarly, we obtain (as shown in Figure 2.5) 
0, —0oo <x <0 


1/4, O<x<l 
3/4, 1<x<2 
1, 2<x<oO. 


F(x) = 


————_)_ R, the real line 


X () 
Sample space S 


Wl FIGURE 2.4 Random variable as a function. 
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1/4 @—O 
——O > 
0 1 2 3 
Wi FIGURE2.5 Graph of F(x). || 


We have seen that a discrete random variable assumes a finite or a countably infinite value. In contrast, 
we define a continuous random variable as one that assumes uncountably many values, such as the 
points on a real line. We now give the definition of a continuous random variable. 


Definition 2.5.4 Let X be a random variable. Suppose that there exists a nonnegative real-valued function: 
f : R = [0, co) such that for any interval [a, b], 


b 
P(X € [a, b]) = / f(tdt. 


Then X is called a continuous random variable. The function f is called the probability density 
function (pdf) of xX. 


The cumulative distribution function (cdf) is given by 


F(x) = P(X <x)= i f(t)dt. 


—co 


For a given function f to be a pdf, it needs to satisfy the following two conditions: f(x) > 0 for all 


values of x, and f°. f(x)dx = 1. 
Also, if f is continuous, then dFG) = f(x), where F(x) is the cdf. This follows from the fundamental 
theorem of calculus. If f is the pdf of a random variable X, then 


b 
Pa<X<b)= / F(dx. 


Figure 2.5 represents P(a < X < b). 

As a result, for any real number a, P(X = a) = 0. Also, 
Pia<X<b)=P(a<X<b)=P(a< X <b)=P(a<X <b). 

If we have cdf F(x), then we have 


Pia<X <b)=F(b)- F(@. 
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SOME PROPERTIES OF DISTRIBUTION FUNCTION 
‘lo O SFO) S 1, 
2. lim F(x)=0,and lim F(x) = 1. 
x—>-0O X>0O 


3. F isanondecreasing function, and right continuous. 


OE 


Example 2.5.4 
Let the function 


Axe *, x>0 
Te 0, otherwise. 
(a) For what value of A is f a pdf? 
(b) Find F(x). 
Solution 
(a) First note that f(x) = 0. Now, for f(x) to be a pdf, we need pee f(x)dx = 1. Because f(x) = 0 
for x < 0, 
Therefore 4. = 1. See Figure 2.6. 
Cc 
Cc 
1= f f@dx = rxe ax 
—cCO 
0 
CO CO 
=A / xe “dx = spo + / a] (using integration 
0 0 by parts) 


ao e*|P] =a, 


f(x) 


X Data 


W@ FIGURE2.6 Probability as an area under a curve. 
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M@ FIGURE2.7 Graph of f(x) = xe*. 
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MH FIGURE2.8 Graph of F(x), x > 0. 
(b) The cumulative distribution function is 


0, x<0O 
x 
frca-= 1-(x+1le*, x>0. 
0 


FX= / f@dt = 


Figure 2.8 represents the cumulative distribution. 
= 


——— een En 
Example 2.5.5 
Suppose that a large grocery store has shelf space for 150 cartons of fruit drink that are delivered on a 
particular day of each week. The weekly sale for fruit drink shows that the demand increases steadily up to 
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100 cartons and then levels off between 100 and 150 cartons. Let Y denote the weekly demand in hundreds 
of cartons. It is known that the pdf of Y can be approximated by 


y O<y<l1 
f= 41, 1l<ys.5 
0, elsewhere. 


(a) Find F(y), 
(b) Find P(O < Y <0.5), 
(c) Find P(0.5 < Y < 1.2). 


Solution 
(a) The graph of the density function f(y) is shown in Figure 2.9. 
From the definition of cdf, we have (Figure 2.10) 


Y 
F(y) = / f@dt = 
—00 


MH FIGURE2.9 Graph of f(y). 
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1 
I 
I 
1 15 ¥ 


W@ FIGURE 2.10 Graph of cdf. 
(b) The probability, 


P(0<Y <0.5) = F (0.5) — F(0) 
= (0.5)? /2 = 1/8 = 0.125. 


(c) 


P(0.5 < Y < 1.2) = F(1.2) — F(0.5) 


= (1.2 — 1/2) — 0.125 = 0.575. 


EXERCISES 2.5 


2.5.1. The probability function of a random variable Y is given by p(i) = x, b= O12) des 
where A is a known positive value and c is a constant. 
(a) Find c. 
(b) Find P(Y = 0). 
(c) Find P(Y > 2). 


2.5.2. Find & so that the function given by 


p(x) = x=1,2,3,4 


x+1’ 
is a probability function. Graph the density and cumulative distribution functions. 
2.5.3. Arandom variable X has the following distribution: 


| =|: 0 3 6 
p(x) | 0.2 | 0.1 | 0.4 | 0.3 


Find the cumulative distribution function F(x) and graph it. 


2.5.4. The cdf of a discrete random variable X is given in the following table: 


x —1 0 2 5 | 6 
p(x) | 0.1 | 0.15 | 0.4 | 0.8 | 1 


(a) Find P(X = 2). 
(b) Find P(X > 0). 
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2.5.5. The cumulative distribution function F(x) of a random variable X is given by 


0, —-wo <x<-l 


0.2, -1 <3 
Fe= <x< 
0.8, 3<x<9 


1, x>9. 


Write down the values of the random variable X and the corresponding probabilities, p(x). 
2.5.6. The probability density function of a random variable X is given by 


cx, O<x<4 
0, otherwise. 


r= | 


(a) Find c. 
(b) Find the distribution function F(x). 
(c) Compute P(1 < x < 3). 


2.5.7. Let the function 
2 


ext, O<x<3 
f@) = 


0, otherwise. 


(a) Find the value of c so that f(x) is a density function. 
(b) Compute P(2 < X < 3). 
(c) Find the distribution function F(x). 


2.5.8. Suppose that Y is a continuous random variable whose pdf is given by 


K(4y — 2y), O<y<2 
ro=| 


0, elsewhere. 


(a) What is the value of K? 
(b) Find P(Y > 1). 
(c) Find F(y). 


2.5.9. The random variable X has a cumulative distribution function 


0, forx <0 
F(x) = 


2 
xX 
Ta?’ for x > 0. 


Find the probability density function of X. 
2.5.10. Arandom variable X has a cumulative distribution function 
0, ifx <0 


F(x)=jfax+b, if0<x<3 
1, ifx> 3. 
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(a) Find the constants a and b. 
(b) Find the pdf f(x). 
(c) Find P(1 < X <5). 


2.5.11. The amount of time, in hours, that a machine functions before breakdown is a continuous 
random variable with pdf 


1_ ,—t/120 


F(t) = | 20° an 


0, t <0. 


What is the probability that this machine will function between 98 and 145 hours before 
breaking down? What is the probability that it will function less than 160 hours? 


2.5.12. The length of time that an individual talks on a long-distance telephone call has been found 
to be of a random nature. Let X be the length of the talk; assume it to be a continuous 
random variable with probability density function given by 


ac US 20 
fa) = 
0, elsewhere. 
Find 
(a) The value of a that makes f(x) a probability density function. 
(b) The probability that this individual will talk (i) between 8 and 12 minutes, (ii) less 
than 8 minutes, (iii) more than 12 minutes. 
(c) Find the cumulative distribution function, F(x). 


2.5.13. Let T be the life length of a mechanical system. Suppose that the cumulative distribution of 
such a system is given by 


0, t<0O 
FQ) = : 
1 exp(-GY*), t>0,a>0,B,y>0. 


Find the probability density function that describes the failure behavior of such a system. 


2.6 MOMENTS AND MOMENT-GENERATING FUNCTIONS 


One of the most useful concepts in probability theory is that of expectation of a random variable. 
The expected value may be viewed as the balance point of distribution of probability on the real line, 
or in common language, the average. 


Definition 2.6.1 Let X be a discrete random variable with pf p(x). Then the expected value of X, denoted 
by E(X), is defined by 


w= E(X)= ~~ xp(x), provided > |x| p(x) < ~w. 


all x all x 
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Now we will define the expected value for a continuous random variable. 


Definition 2.6.2 The expected value of a continuous random variable X with pdf f(x) is defined by 


Ce CO 
w= E(X)= / xf (x)dx, provided / |x| f(x)dx < oo. 
—oo —oo 


The expected value of X is also called the expectation or mathematical expectation of X. We denote 
the expected value of X by w. 


$$ 


Example 2.6.1 
Let 


1, with a probability 1/2 


»¢ = 
0, with a probability 1/2. 


Then E(X) = 1(1/2) + 0(1/2) = 1/2. 
= 


oon, 


Example 2.6.2 
Let X be a discrete random variable whose probability density function is given in the following table: 


x -1 0 1 2 3 4 5 
1 1 2 1/11 
P(x) 7 7 | 14| 7 | 14/7 |7 
Find E(X). 
Solution 
By definition, 


Example 2.6.3 
Let X >0 be an integer-valued random variable such that P(X =n) = pn. Show that EX = 
re, P(X =n). 
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Solution 
Using the definition of expectation, and the fact that (0)(po) = 0, we have 
CO 
EX =)" npn = 1p1 + 2p2 +3p3+-° 
n=1 


= Pit pr2t p3t--: 
+ p2+ p3+ pat-:: 
+ pst pases 
= P(X>1)+P(X=2)4-:: 


CO 
= > P(X>n). 
n=1 
| 


— !:.OC”:....?.?.:.R.Qnm3..F—_e =“ 
Example 2.6.4 
Suppose you are selling a car. Let Xo, X1, X2, ... be the successive offers occurring at times 0, 1, 2,..., 7, 
that you receive (assume that the offers are random, independent, and have the same distribution); see 
Figure 2.11. Show that E(N) = oo, where N = min{n : X, > Xo}, that is the first time an offer exceeds the 
initial offer XQ at time ‘0’. 


x 
x 


Tritt 


0 2 3 4 n-1 n 


Wi FIGURE 2.11 Size of successive offerings. 


Solution 
By definition, 


P(N =n) = P(Xo is largest of Xo, X1,..., Xn—1) 


1 
= by symmetry, 


as any of the X's could be more than the rest. Hence, using Example 2.6.3, 
CO Cc 1 


You would expect to wait a long time to receive an offer better than the first one. So, take the first offer. 
= 


2.6 Moments and Moment-Generating Functions 95 


Definition 2.6.3 The variance of a random variable X is defined by 
o2 = Var(X) = E(X — 1)? 


The square root of variance, denoted by o, is called the standard deviation. 
The variance is a measure of spread or variability of values of a random variable around the mean. 


The next result shows how to obtain the expectation of a function of a random variable. 


EXPECTATION OF FUNCTION OF A RANDOM VARIABLE 
Theorem 2.6.1 Let g(X) be a function of X, then the expected value of g(X) is 
Y g(x) p@), if X is discrete 


E[g(X)]= 4 © meet 
f g()f(@dx, if X is continuous 
—0o 


provided the sum or the integral exists. 


We now give some properties of the expectation of a random variable. 


SOME PROPERTIES OF EXPECTED VALUE AND VARIANCE 


Theorem 2.6.2 Let c be a constant and let g(X), g1(X),..., gn(X) be functions of a random variable X such 
that E(g(X)) and E(g;(X)) fori = 1, 2,...,n exist. Then the following results hold: 

(a) E(c)=c. 

(b) Elcg(X)] = cE [g(X)]. 

(c) FIX gi(X)] = X E[gi(X)]. 


(da) Var (aX +b)= ie Var(X). In particular, Var(aX) = a 2 Var(X). 
(e) Var(X) = E(X*) — p?. 


Proof. Proof of (a) through (d) will be given as an exercise. We will prove (e). 


Var(X) = E(X — )* 
= E(x? —2Xu+ y?) 


=i x?) — 2wE (X) + p2 
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Example 2.6.5 
A discrete random variable X is said to be uniformly distributed over the numbers 1, 2, 3,..., 7, if 
Pa Spe 4, Tate, 
Find EX and Var(X). 


Solution 
By definition 


ll 
a 
“oN 

LR 
Nn 
+ 
i) 
SOS 
Sle 
Ne 
+ 
+ 
= 
i 
be 
Nee 


1[n(@m+1) _ ntti 
n 2 ~ 2° 


Similarly, using the summation formula 12 +22 +---+n2 = ane Dnt 1) we get 


poe (an) 


1 Ee +1)(Qn+ >| 


n 6 


_ @t+DQnt)) 
=. 


Hence, 


Var(X) = EX? — (EX)? 


_ @+))@n+)) (“4 ). 
> 2 
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Example 2.6.6 
To find out the prevalence of smallpox vaccine use, a researcher inquired into the number of times a 
randomly selected 200 people aged 16 and over in an African village had been vaccinated. He obtained 
the following figures: never, 17 people; once, 30; twice, 58; three times, 51; four times, 38; five times, 7. 
Assuming these proportions continue to hold exhaustively for the population of that village, what is the 
expected number of times those people in the village had been vaccinated, and what is the standard 
deviation? 


Solution 
Let X denote the random variable representing the number of times a person aged 16 or older in this village 
has been vaccinated. Then, we can obtain the following distribution: 


x 0 1 2 3 4 5 
p(x) | 17/200 | 30/200 | 58/200 | 51/200 | 38/200 | 7/200 


Then, 


E(X) = Soxp@) = 59g (0017 + 1(30) + 2(58) + 3(51) + 4(38) + 5(7)) 
= 2.43. 
Also, 
Var(X) = E(X?) — (E(X))? 
= )> x? p(x) — (2.43)? = 7.52 — (2.43)? 


= 1.6151. 


Thus, the standard deviation is 1.6151 = 1.2709. 


EE —————_—_————=———==£=£#€#/VHV=—=_————_—_ 


Example 2.6.7 
Let Y be a random variable with pdf 


3 
fo) = me oes O<y<4 
0, elsewhere. 


(a) Find the expected value and variance of Y. 
(b) Let X = 300Y +50. Find E(X) and Var(X), and 
(c) Find P(X > 750). 
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Solution 
(a) 
CO 
E(Y) = / yf (dy 
—oo 
P 4 
_ 2 204 
= af» (4 — y)dy 
0 
=2.4 
and 
4 
Var(¥) = i (y— 2.4)? 292 (4 yyay 
64 
0 
= 0.64. 


sing the fact that Var(aY + b) = a“ Var(Y), we have 
b) Using the fact th b) =a? h 


Var(X) = (300)? Var(Y) 
= 90,000(0.64) = 57,600. 


(°) 


P(X > 750) = P(300Y + 50 > 750) 


— P(r > 5) 
3 
4 
3 
=— | y?(4—y)dy = 0.55339. 


2.6.1 Skewness and Kurtosis 


Even though the mean y and the standard deviation o are significant descriptive measures that locate 
the center and describe the spread or dispersion of probability density function f(x), they do not 
provide a unique characterization of the distribution. Two distributions may have the same mean 
and variance and yet could be very different, as in Figure 2.12. 


To better approximate the probability distribution of a random variable, we may need higher 
moments. 


2.6 Moments and Moment-Generating Functions 99 


0.5 
0.4 
0.3 
0.2 


0.1 4 


0.0 T T 1 T T T 1 


Mean = 1 Mean = 1 
Variance = 1 Variance = 1 


W@ FIGURE 2.12 Same mean and variance. 


Definition 2.6.4 The kth moment about the origin of a random variable X is defined as EX* and 
denoted by ju), whenever it exists. The kth moment about its mean (also called central kth moment) 
of a random variable X is defined as E [(x _ w*] and denoted by px, k = 2,3, 4, ..., whenever it exists. 


In particular, we have E(X) = w, = 1, and o* = p12. We have seen earlier that the second moment 
about mean (variance, o~) is used as a measure of dispersion about the mean. 


Definition 2.6.5 The standardized third moment about mean 


_EX-4) _ os 
~~ 3 ~ 3/2 
o 


a3 


is called the skewness of the distribution of X. The standardized fourth moment about mean 


E(X — p)4 
4> 4 
(oy 


is called the kurtosis of the distribution. 


Skewness is used as a measure of the asymmetry (lack of symmetry) of a density function about 
its mean. Recall that a distribution, or data set, is symmetric if it looks the same to the left and 
right of the center point. If #3 = 0, then the distribution is symmetric about the mean, if a3 > 0, 
the distribution has a longer right tail, and if #3 < 0, the distribution has a longer left tail. Thus, 
the skewness of a normal distribution is zero. Kurtosis is a measure of whether the distribution is 
peaked or flat relative to a normal distribution. Kurtosis is based on the size of a distribution’s tails. 
Positive kurtosis indicates too few observations in the tails, whereas negative kurtosis indicates too 
many observations in the tail of the distribution. Distributions with relatively large tails are called 
leptokurtic, and those with small tails are called platokurtic. A distribution which has the same kurtosis 
as a normal distribution is known as mesokurtic. It is known that the kurtosis for a standard normal 
distribution a4 = 3. 


An important expectation is the moment-generating function for a random variable, in a sense, this 
packages all the moments for a random variable in one expression. 


100 CHAPTER 2 Basic Concepts from Probability Theory 


Definition 2.6.6 Forarandom variable X, suppose that there is a positive number h such that for —h <t <h 
the mathematical expectation E (eX) exists. The moment-generating function (mgf) of the random 
variable X is defined by 


die p(x), if discrete 
My(t) = E(e*) = ; 
( ) fe™f(@)dx, if continuous 


An advantage of the moment generating function is its ability to give the moments. Recall that the 
Maclaurin series of the function e” is 


(tx)? (tx) (tx)" 
ee ae a 


e* =14+ixe+ 


By using the fact that the expected value of the sum equals the sum of the expected values, the 
moment-generating function can be written as 


2 3 n 
myo = ele] =z] rare pide +] 


2! 3! n!} 


2 3 n 
= 1+ 18X14 SE[X?] + Sel x3] +--+ Se [x"] +- 
~ 2! 3! nl 


Taking the derivative of My (t) with respect to t, we obtain 


dMx(t)_ | rs 
“ = My (0) = EX] + 1B [X] + 5 £| X7| 
Pp 4 pa-) : 

+ 52x pete eo 


Evaluating this derivative at t = 0, all terms except E[X] become zero. We have 
M0) = E[X]. 
Similarly, taking the second derivative of My (t), we obtain 
My (0) = E[X?]. 


Continuing in this manner, from the nth derivative uy (t) with respect to ¢, we obtain all the moments 
to be 


M (0) = E[X"], n=1,2,3,.... 


We summarize these calculations in the following theorem. 
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Theorem 2.6.3 If Mx(t) exists, then for any positive integer k, 


d* Mx (t) 


(k) as 
atk = My (0) = by. 


t=0 


The usefulness of the foregoing theorem lies in the fact that, ifthe mgf can be found, the often difficult 
process of integration or summation involved in calculating different moments can be replaced by 
the much easier process of differentiation. The following examples illustrate this fact. 


eee 
Example 2.6.8 
Let X be a random variable with pf 


PXy)= ( pp), x=0,1,2,...,n. 
x 


(This random variable is called a binomial random variable, and the pf is called a binomial distribution.) 
Show that My (t) = [(1 — p) + pe‘]”, for all real values of t. Also obtain mean and variance of the random 
variable X. 


Solution 
The moment-generating function of X is 


Mx(t) = E(e*) = me ( p* qa pr 


n 


=o @ (pe = py, 


x=0 


Using the binomial formula, we have 
Mx(t) = [pe +1 - p)]", —-0O <f<o. 
The first two derivatives of Mx(t) are 
My (t) = n[ = p) + pe]? (pe!) 
and 
MY) = nv — [C= p)t pel] (pel)? + n[ = p) + pet]? (pel). 
Thus, 


b= E(X) = My(0) = np 
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and 
ee E(x?) — 2 = M"(0) — (np)? 


= n(n — 1) p* +np — (np)? = np( — p). 


ee 


Example 2.6.9 
Let X be a random variable with pmf f(x) = ea /(x), x =0,1,2,....(Such a random variable is called 
a Poisson r.v. and the distribution is called a Poisson distribution with parameter A.) Find the mof of X. 


Solution 
By definition 


Mx(t) = Ee'™ = De f(x) 
x=0 


oo a ioe) x 
7 yer! a oo (e'A) 
! ! 
x=0 a x=0 - 
a ae | eT? (Aer) 
_ ae-1) So fe 2 (et) 
CO —(Ae! x 
We observe that eRe (xey* /x! is a Poisson pf with parameter re’. Hence >> a eelGey = 1. Thus 
x=0 i 
from (1), 
My(t) =D, 
= 


—_—_—_—_—-----_—”:.:,KT_axXCCeeoeEeaee—— 
Example 2.6.10 
Let X be a random variable with pdf given by 


e*/B) x sO 
0, otherwise. 


Find mgf Mx (t). 
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Solution 
By definition of maf, 


—coO 


M,(t) = / e™ f (x)dx 


=! a 3. Get x|° 
~ BL (A/p)-1) 2=0 
1 £B 1 1 


~ Bl—Bt 1—6r B 


Example 2.6.11 
Let X be a random variable with pdf f(x) = 1//2z) et /2, —0o <x < oo. (We call such random 
variable a standard normal random variable.) Find the maf of X. 


Solution 
By the definition of mgf, we have 


+00 
1 2 
E(e") = ix g—X /2q 
V20 
—0o 
+00 
1 2 
= — ent — 21x) dy 
V20 
—00 
+00 
= 1 9 2eP eS ay 
V20 
—0o 
+00 
1 1 2 
= 3X1) +> 
—— e 2 2 dx 
V20 
—00 
+00 
_ 2/2 1 1 AI 7a FP 
20 
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a 2 TOO 4 2 
as 1//27e ae) is anormal pdf with mean t and variance 1 and hence Tk f e 20-9)" = 1, 
IU 
—0o 


A random variable X with pdf 
fay a 
f(x) = (1/V2m)e 207 , ~~ <x<0o 
is called a normal random variable with mean and variance o2. We will denote such random variables by 


X: N(, 02). 
| 


PROPERTIES OF THE MOMENT-GENERATING FUNCTION 
1. The moment-generating function of X is unique in the sense that, if two random variables X and Y 
have the same mgf (My (t) = My (t), for t in an interval containing 0), then X and Y have the same 
distribution. 
2. If X and Y are independent, then 


My-+y (t) = My (t)My (t). 


That is, the mgf of the sum of two independent random variables is the product of the mgfs of the 
individual random variables. The result can be extended ton’ random variables. 
3. Let Y = aX + b.Then 


My (t) = e°¢ My (at). 


———__ee_a—__C SSS 
Example 2.6.12 
Find the mof of X : N(, 07). 
Solution 
Let Y : N(O, 1) and let X = oY + uw. Then by the foregoing property (3), and the Example 2.6.11, the maf of 
X is 


Mx(t) = e My (ot) 


1.959 299 
= elt prot = htt yor 


——$< $$ 
Example 2.6.13 
Let X; : N(u,07), X2 : N (wu, 0%). Let X1 and X2 be independent. Find the mgf of Y = X, + X2 and 
obtain the distribution of Y. 
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Solution 
By property (2), 


My (t) = Mx, (OMx,() 
= (cite dei?) (quarto?) 
= e(HitHaytt5 (of +03)? | 


This implies Y : N(wy + 42,07 +05). 
= 


This result can be generalized. If X,,...,X, are independent random variables such that X; : 
N (ui, 07), i= 1, 2,...,, then we can show that 77) a;Xi : N (QoL1 aii, D_, a? o?). 
EXERCISES 2.6 


2.6.1. Find E(X) where X is the outcome when one rolls a six-sided balanced die. Find the mgf of 
X. Also, using the mgf of X, compute the variance of X. 


2.6.2. The grades from a statistics class for the first test are given by 


x, | 96 | 87 | 65 | 49 | 77 | 74 | 99 | 68 | 56 | 84 
p(x) | 3/15 | 2/15 | 1/15 | 1/15 | 2/15 | 1/15 | 1/15 | 1/15 | 1/15 | 2/15 


(a) Find mean wp and variance o?. 
(b) Find the megf. 


2.6.3. The cdf of a discrete random variable Y is given in the following table: 


y —1 0 2 6 
F(y) | 0.1 | 0.15 | 0.4 | 0.8 | 1 


(a) Find EY, EY’, EY?, and Var(Y). 
(b) Find the mef of Y. 


2.6.4. A discrete random variable X is such that 


n-1 
EO a WM 1,2) cag Nein 


Show that EX = 3. 


2.6.5. A discrete random variable X is such that 


Show that EX = oo. That is, X has no mathematical expectation. 
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2.6.6. Let X be a random variable with pdf f(x) = kx? where 0 < x < 1. 
(a) Find k. 
(b) Find E(X) and Var(X). 
(c) Find Mx(t). Using the megf, find E(X). 


2.6.7. Let X be a random variable with pdf f(x) = ax? + b,0 < x < 1. Find a and b such that 
E(X) = 3. 


2.6.8. Given that X;, Xz, X3, and X4 are independent random variables with mean 2 and variance 
4, find E(Y) and E(Z) for 


Y =3X4—X1+4X3 


Z = X2+ 7X3 —-9X}. 


2.6.9. Fora random variable X, prove (a)-(d) of Theorem 2.6.2. 


2.6.10. Let ¢ (for “error”) be arandom variable with E(e) = 0, and Var(e) = o7. Define the random 
variable, X = u + e, where yu is a constant. Find E(X), Var(X), and E(e*). 


2.6.11. A degenerate random variable is a random variable taking a constant value. Let X = c. Show 
that E(X) = c, and Var(X) = 0. Also find the cumulative distribution function of the 
degenerate distribution of X. 


2.6.12. Let Y : N(j, 0). Use the megf to find E(X?) and E(X*). 


2.6.13. Using Theorem 2.6.3, show that the mean and variance of the Poisson distribution, with 
parameter i, is equal to A. 


2.6.14. Let X bea discrete random variable with a mass function 


1 
xt) x=1,2, ; 
P(x) = 
0, otherwise. 


Show that the moment-generating function does not exist for this random variable. 


2.6.15. Let X be a random variable with geometric pdf 


f@=p-p)*, x=1,2,3,.... 


(a) Find F(X) and Var(X). 
(b) Show that My(t) = 


et 


P 
1—(1—p)e’’ 


2.6.16. Find E(X) and Var(X) for a random variable X with pdf f(x) = 1/2e—""'. 


t<—In(1—p). 
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2.6.17. The probability density function of the random variable X is given by 


%, 0<x<1, 
2 
Gx— 24 le 1<x<2, 
f(x) = , 
er 2<x<3, 
0, otherwise. 


Find the expected value of the random variable X. 


2.6.18. Let the random variable X be normally distributed with mean 0 and variance o7. Show that 
E(x?*+1) —0, where k = 0, 1, 2,.... 


2.6.19. Find the mef of the random variable X with pdf f(x) = 1/2e~""!, -0o < x < oo. 


2.6.20. Ifthe kth moment of a random variable exists, show that all moments of order less than k 
exist. 


2.6.21. Suppose that the random variable X has an mgf 


a 
MiSs fe 5 
a-t a 


Let the random variable Y have the following function for its probability density: 


ae, y>0,a>0, 


gy) = 0, otherwise. 


Can we obtain the probability density of the variable X with the foregoing information? 


2.7 CHAPTER SUMMARY 


In this chapter, we have introduced the concepts of random events and probability, how to compute 
the probabilities of events using counting techniques. We have studied the concept of conditional 
probability, independence, and Bayes’ rule. Random variables and distribution functions, moments, 
and moment-generating functions of random variables have also been introduced. 


The following lists some of the key definitions introduced in this chapter. 


= Sample space 

Mutually exclusive events 

Informal definition of probability 
Classical definition of probability 
Frequency interpretation of probability 
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a Axiomatic definition of probability 
= Multinomial coefficients 

= Conditional probability 

= Mutually independent events 

u Pairwise independent events 

m Random variable (r.v.) 

a Discrete random variable 

m Discrete probability mass function 
= Cumulative distribution function 
= Continuous random variable 
m Expected value 
a kth moment about the origin 
a kth moment about its mean 
mu Skewness and kurtosis 
= Moment-generating function 


The following important concepts and procedures have been discussed in this chapter: 


= Method of computing probability by the classical approach 

= Some basic properties of probability 

= Computation of probability using counting techniques 

= Four sampling methods: 

a Sampling with replacement and the objects are ordered 

a Sampling without replacement and the objects are ordered 
a Sampling without replacement and the objects are not ordered 
a Sampling with replacement and the objects are not ordered 
Permutation of n objects taken m at a time 

Combinations of n objects taken m at a time 

Number of combinations of n objects into m classes 

Some properties of conditional probability 

Law of total probability 

Steps to apply Bayes’ rule 

Some properties of distribution function 

Some properties of expected value 

Expectation of function of a random variable 

Properties of moment-generating functions 


2.8 COMPUTER EXAMPLES (OPTIONAL) 


The three softwares packages, Minitab, SPSS, and SAS, that we are using in this book are not specifically 
designed for probability computations. However, the following examples are given to demonstrate 
that we will be able to use the software for some basic probability computations. We do not rec- 
ommend using any of these three software packages for probability calculations; they are basically 
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designed for statistical computations. There are many other software packages such as Maple or 
MATLAB, that can be used efficiently for probability computations. 


2.8.1 Minitab Computations 

In order to find the cdf of a random variable, we can use the following commands in Example 2.8.1. 

We can use the mathematical expressions to find the expected value of a discrete random variable. 
A$ A AAAS] 


Example 2.8.1 
A random variable X has the following distribution: 


x )1/4]75)etn 
p(x) | 0.2 | 0.2] 0.1 | 0.15 | 0.35 


Find P(X < 4). 


Solution 
Enter x values in C1 and p(x) values in C2. 


Calc > Probability Distributions > Discrete. .. > click Cumulative probability, and in Values in: 
enter C1, Probabilities in: enter C2, click input column: enter C1, in Optional storage: enter 
C3 > OK 


We will get the following output in column C3. 


0.20 0.40 0.50 0.65 1.00 


EEO EOE 
Example 2.8.2 
For the random variable X in Example 2.8.1, find E(X). 


Solution 
Enter x values in column C1 (i.e., 14.5 8 11), and enter p(x) values in column C2. Use the following procedure. 


Calc > Calculator. .. > Store results in variable: type C3 > in Expression: type (C1)*(C2) > click OK 
Then to find the sum of values in column C3 > Cale > Column Statistics... > click Sum and in Input 
variable: type C3 > click OK 


We will get the output as 


Column Sum 
Sum of C3 = 6.5500 
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Note that this Sum gives the E(X). In the previous procedure, if we store the expression 
(C1)*(C1)*(C2) in column C4 and find the sum of terms in C4, we will get E(X*). Using this, 
we will be able to compute Var(X). Using a similar procedure, we can obtain £(X") for anyn > 1. 


2.8.2 SPSS Examples 


I 
Example 2.8.3 
For the random variable X in Example 2.8.1, find E(X). 


Solution 
In column 1, enter the x values and column 2 enter the p(x) values. Then 


Transform > compute... > in target variable: type a name, say, product. Move var00007 and 
var00002 to Numeric Expression: field and put ‘*” in between them as (var00001)*(var00002). 
Then use the SUM(., .) command to find the value of E(X) 


2.8.3 SAS Examples 


2 


Example 2.8.4 
A random variable X has the following distribution: 


x 2715/6)]8]9 
P(X) | 0.1 | 02/03/01] 03 


Using SAS, find E(X). 


Solution 
For discrete distributions where the random variable takes finite values, we can adapt the following procedure: 


data evalue; 
input x y n; 
Z=x*y*n; 
cards; 

2.15 

5.25 

6.35 

8.15 

9.35 


run; 
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proc means; 
run; 


We know that if proc means is used just for x* y, that will give us i > xp(x); hence, multiplying by n, 
the number of values X takes will give us E(X) = >> xp(x). We will get the following output: 


The MEANS Procedure 


Variable N Mean Std Dev Minimum Maximum 
x 5 6.0000000 2.7386128 2.0000000 9.0000000 
y 5 0.2000000 0.1000000 0.1000000 0.3000000 

5 5.0000000 0 5.0000000 5.0000000 
Z 5 6.5000000 4.8476799 1.0000000 13.5000000 


From this, we can see that E(X) = 6.5. A direct way to find the expected value is by using “PROC 
IML.” 


options nodate nonumber; 
/* Finding expected value of a random variable */ 
proc iml; 
/* defining all the variables */ 
x={2 5 6 8 9}; /* a row vector */ 
={.1.2 .3 .1 .3}; /* probabilities */ 
/* calculations */ 
ae 
/* print statements */ 
print “Display the vector x and probability y and the expected value”; 
print x y, z; 
quit; 


We will get the following output: 


0.1 0.2 0.3 0.1 0.3 
Z 
6.5 
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PROJECTS FOR CHAPTER 2 
2A. The Birthday Problem 


The famous birthday problem is to find the smallest number of people one must ask to get an even 
chance that at least two people have the same birthday. To solve this you can use the following steps. 


Find the probability that in a group of k people no two have the same probability. Let g be this 
probability. Then p = 1 — q is the probability that at least two people have the same birthday. 
Ignoring leap years, take the sample space S as all sequences of length k with each element one of 
the 365 days in the year. Thus there are 365* elements in S. 


(a) Find the total number of sequences with no common birthdays. 
(b) Assuming that each sequence is equally likely, show that 


_ 365)364)...365—k+ 1) 
— 365k 


(c) Write a computer program for calculating g for k = 2 to 50, and find the first k for which 
p > 0.5. This will give the least number of people we should ask to make it an even chance 
that at least two people will have the same birthday. 


2B. The Hardy--Weinberg Law 


Hereditary traits in offspring depend on a pair of genes, one each contributed by the father and the 
mother. A gene is either a dominant allele, denoted by A, or a recessive allele, denoted by a. If the 
genotype is AA, Aa, or aA, then the hereditary trait is A, and if the genotype is aa, then the hereditary 
trait is a. Suppose that the probabilities of the mother carrying the genotypes aa, aA (same as Aa), 
and AA are p, qg, and r, respectively. Here p+ g+r = 1. The same probabilities are true for the father. 


(a) Assuming that the genetic contributions of the mother and father are independent and the 
matings are random, show that the respective probabilities for the first-generation offspring 
are 

Pi = (p+4/2)? 41 = 20 +q/2) (p+ 4/2) 71 = +.4/2). 
Also find P(A) and P(a). 

(b) The Englishman G. H. Hardy and the German W. Weinberg could show that the foregoing 
probabilities in a population stay constant for generations if certain conditions are fulfilled. 
This is known as the Hardy-Weinberg law. Under the conditions of part (a), using the induc- 
tion argument, show that the Hardy-Weinberg law is satisfied, i.e., py = P1, Gn = 41, and 
rn = ry for all n > 1. The consequences of the Hardy-Weinberg law are that (i) no evolu- 
tionary change occurs through the process of sexual reproduction itself, and (ii) changes in 
allele and genotype frequencies can result only from additional forces on the gene pool of a 
species. 
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Objective: In this chapter we present some special distributions, joint distributions of several random 
variables, functions of random variables, and some important limit theorems. 
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Johann Carl Friedrich Gauss 
(Source: http://tobiasamuel.files.wordpress.com/2008/06/carl_friedrich_gauss.jpg ) 


German mathematician and physicist Carl Friedrich Gauss (1777-1855) is sometimes called the 
“prince of mathematics.” He was a child prodigy. At the age of 7, Gauss started elementary school, 
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and his potential was noticed almost immediately. His teachers were amazed when Gauss summed the 
integers from 1 to 100 instantly. At age 24, Gauss published one of the most brilliant achievements 
in mathematics, Disquisitiones Arithmeticae (1801). In it, Gauss systematized the study of number 
theory. Gauss applied many of his mathematical insights in the field of astronomy, and by using 
the method of least squares he successfully predicted the location of the asteroid Ceres in 1801. In 
1820 Gauss made important inventions and discoveries in geodesy, the study of the shape and size of 
the earth. In statistics, he developed the idea of the normal distribution. In the 1830s he developed 
theories of non-Euclidean geometry and mathematical techniques for studying the physics of fluids. 
Although Gauss made many contributions to applied science, especially electricity and magnetism, 
pure mathematics was his first love. It was Gauss who first called mathematics “the queen of the 
sciences.” 


3.1 INTRODUCTION 


In the previous chapter, we looked at the basic concepts of probability calculations, random variables, 
and their distributions. There are many special distributions that have useful applications in statistics. 
It is worth knowing the type of distribution that we can expect under different circumstances, because 
a better knowledge of the population will result in better inferential results. In the next section, we 
discuss some of these distributions with some additional distributions presented in Appendix A3. 
We also briefly deal with joint distributions of random variables and functions of random variables. 
Limit theorems play an important role in statistics. We will present two limit theorems: the law of 
large numbers and the Central Limit Theorem. 


3.2 SPECIAL DISTRIBUTION FUNCTIONS 


Random variables are often classified according to their probability distribution functions. In any 
analysis of quantitative data, it is a major step to know the form of the underlying probability 
distributions. There are certain basic probability distributions that are applicable in many diverse 
contexts and thus repeatedly arise in practice. A great variety of special distributions have been stud- 
ied over the years. Also, new ones are frequently being added to the literature. It is impossible to 
give a comprehensive list of distribution functions in this book. There are many books and Web sites 
that deal with a range of distribution functions. A good list of distributions can be obtained from 
http://www.causascientia.org/math_stat/Dists/Compendium.pdf. In this section, we will describe 
some of the commonly used probability distributions. In Appendix A3, we list some more distri- 
butions with their mean, variance, and moment-generating functions. First we discuss some discrete 
probability distributions. 


3.2.1 The Binomial Probability Distribution 


The simplest distribution is the one with only two possible outcomes. For example, when a coin (not 
necessarily fair) is tossed, the outcomes are heads or tails, with each outcome occurring with some 
positive probability. These two possible outcomes may be referred to as “success” if heads occurs and 
“failure” if tails occurs. Assume that the probability of heads appearing in a single toss is p; then 
the probability of tails is 1 — p = gq. We define a random variable X associated with this experiment 
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as taking value 1 with probability p if heads occurs and value 0 if tails occurs with probability gq. 
Such a random variable X is said to have a Bernoulli probability distribution. That is, X is a Bernoulli 
random variable if for some p, 0 < p < 1, the probability P(X = 1) = p and P(X = 0) = 1— p. The 
probability function of a Bernoulli random variable X can be expressed as 


p(1— py, x=0,1 
P(x) = P(X =x) = : 
0, otherwise. 


Note that this distribution is characterized by the single parameter p. It can be easily verified that 
the mean and variance of X are E[X] = p, var(X) = pq, respectively, and the moment-generating 
function is My(t) = pe’ + (1 — p). 


Even when the experimental values are not dichotomous, reclassifying the variable as a Bernoulli 
variable can be helpful. For example, consider blood pressure measurements. Instead of representing 
the numerical values of blood pressure, if we reclassify the blood pressure as “high blood pressure” 
and “low blood pressure,” we may be able to avoid dealing with a possible misclassification due to 
diurnal variation, stress, and so forth, and concentrate on the main issue, which would be: Is the 
average blood pressure unusually high? 


In a succession of Bernoulli trials, one is more interested in the total number of successes (whenever a 
1 occurs in a Bernoulli trial, we term it a “success”). The probability of observing exactly k successes in 
n independent Bernoulli trials yields the binomial probability distribution. In practice, the binomial 
probability distribution is used when we are concerned with the occurrence of an event, not its 
magnitude. For example, in a clinical trial, we may be more interested in the number of survivors 
after a treatment. 


Definition 3.2.1 A binomial experiment is one that has the following properties: (1) The experiment 
consists of n identical trials. (2) Each trial results in one of the two outcomes, called a success S and failure 
E. (3) The probability of success on a single trial is equal to p and remains the same from trial to trial. The 
probability of failure is 1 — p = q. (4) The outcomes of the trials are independent. (5) The random variable 
X is the number of successes in n trials. 


Earlier we have seen that the number of ways of obtaining x successes in n trials is given by 


n\ _ n! 
@. ~ x(n — x)! 


Definition 3.2.2 A random variable X is said to have binomial probability distribution with parameters 
(n, p) if and only if 


P(X =x) = p@) = (")ora 
x 
eae x=0,1,2,...,n,0<p<1, andqg=1-p 


0, otherwise. 
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To show the dependence on n and p, denote p(x) by b(x, n, p) and the cumulative probabilities by 
x 
B(x, n, p) = Yo di, n, p) 
i=0 


Binomial probabilities are tabulated in the binomial table. 
By the binomial theorem, we have 


n 


(p+g"= >> ("onan 


x=0 


Because (p + q)=1, we conclude that }°;_) D(i,n, p) = d-y— (")orar =1"=1, for all n>1 


and 0<p<1. Hence, p(x) is indeed a probability function. The binomial probability distribu- 
tion is characterized by two parameters, the number of independent trials n and the probability of 
success p. 


Oooo —_:.—n—n— ae 
Example 3.2.1 
It is known that screws produced by a certain machine will be defective with probability 0.01 independently 
of each other. If we randomly pick 10 screws produced by this machine, what is the probability that at least 
two screws will be defective? 


Solution 
Let X be the number of defective screws out of 10. Then X can be considered as a binomial r.v. with 
parameters (10, 0.01). Hence, using the binomial pf p(x), given in Definition 3.2.2, we obtain 


10 10 
P(X >2)= > ( ) (0.01)* (0.99) !9-* 
x 
x=2 
= 1—[P(X = 0) + P(X = 1)] = 0.004. 
= 


In Chapter 2, we saw Mendel’s law. In biology, the result “gene frequencies and genotype ratios in 
a randomly breeding population remain constant from generation to generation” is known as the 
Hardy-Weinberg law. 


ES=—e,eaieTYKtTtWtQQgg SSS ———o—ooo's 
Example 3.2.2 
Suppose we know that the frequency of a dominant gene, A, in a population is equal to 0.2. If we randomly 
select eight members of this population, what is the probability that at least six of them will display the 
dominant phenotype? Assume that the population is sufficiently large that removing eight individuals will 
not affect the frequency and that the population is in Hardy-Weinberg equilibrium. 
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Solution 
First of all, note that an individual can have the dominant gene, A, if the person has traits AA, aA, or Aa. 
Hence, if the gene frequency is 0.2, the probability that an individual is of genotype A is 


P(A) = P(AAU AaUaA) = P(AA) + 2P(Aa) 
= (0.2)? + 2(0.2)(0.8) = 0.36. 
= 


Let X denote the number of individuals out of eight that display the dominant phenotype. Then X 
is binomial with n = 8, and p = 0.36. Thus, the probability that at least six of them will display the 
dominant phenotype is 


P(X > 6) = P(X = 6) + P(X =7) + P(X = 8) 


2 


8/10 
i=6 i 


( (0.36)'(0.64) !9-! = 0.029259. 


For large n, calculation of binomial probabilities is tedious. Many statistical software packages 
have binomial probability distribution commands. For the purpose of this book, we will use 
the binomial table that gives the cumulative probabilities B(x, n, p) for n=2 through n= 20 and 
p=0.05, 0.10, 0.15,..., 0.90, 0.95. If we need the probability of a single term, we can use the 
relation 


P(X =x) = d(x, n, p) = B(x, n, p) — Bix — 1,n, p). 


e—K——!?:0C !::::?:?:?:?:°:°0O Oe|!]'=_—_—_——_.G 
Example 3.2.3 
A manufacturer of inkjet printers claim that only 5% of their printers require repairs within the first year. If 
of a random sample of 18 of the printers, four required repairs within the first year, does this tend to refute 
or support the manufacturer's claim? 


Solution 

Let us assume that the manufacturer's claim is correct; that is, the probability that a printer will require 
repairs within the first year is 0.05. Suppose 18 printers are chosen at random. Let p be the probability that 
any one of the printers will require repairs within the first year. We now find the probability that at least four 
of these out of the 18 will require repairs during the first year. Let X represent the number of printers that 
require repair within the first year. Then X follows the binomial pmf with p = 0.05, n = 18. The probability 
that four or more of the 18 will require repair within the first year is given by 


18 


P(X >4)= x (‘“*) (0.05)* (0.95) 18-* 


x=4 
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or, using the binomial table, 


18 
)— d(x, 18, 0.05) = 1 — B(3, 18, 0.05) 
x=4 

= 1-0.9891 

= 0.0109. 


This value (approximately 1.1%) is very small. We have shown that if the manufacturer's claim is correct, 
then the chances of observing four or more bad printers out of 18 are very small. But we did observe exactly 
four bad ones. Therefore we must conclude that the manufacturer's claim cannot be substantiated. 

= 


MEAN, VARIANCE, AND MGF OF A BINOMIAL RANDOM VARIABLE 
Theorem 3.2.1 If X is a binomial random variable with parameters n and p, then 


E(X) = “= np 
Var(X) = o* = np(1 — p). 
Also the moment-generating function 


Mx(t) =[pe' + — p)]". 


Proof. We derive the mean and the variance. The derivation for mgf is given in Example 2.6.5. Using 
the binomial pmf, p(x) = (n!/(x!(n — x)!)) p*q" *, and the definition of expectation, we have 


w= E(X) =) xp) = oi yr ae 


n 


n} x n-X 
= Ge py’ 


since the first term in the sum is zero, as x = 0. 
Let i = x — 1. When x varies from 1 through n, i = (x — 1) varies from zero through (n — 1). Hence, 


n! 


— — a pitl n—i-1 
> Gaiam 


n—-1 
(w#—1)! n=1=i 
=" 2, ii@—1—pl? p(1—p) 1 


= np, 
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because the last summand is that of a binomial pmf with parameter (n — 1) and p, hence, equals 1. 
To find the variance, we first calculate FE [X(X — 1)]. 


E[X(X-—1] = 28 De 


n 


nl *. n-xX 
=). G=aar ps 


x=2 


because the first two terms are zero. Let i = x — 2. Then, 


n—2 ' ; ; 
EIXX-D1= SO aaa 
i=0 


n—2 
—2)! ; 
= n(n —1)p? z ~ > HP PN" 


= n(n —1)p’, 
because the last summand is that of a binomial pf with parameter (n — 2) and p thus equals 1. 
Note that E(X(X — 1)) = EX? — E(X), and so we obtain 
o* = Var(X) = E(X*) —[E(X)|* 
= E[X(X— D] + £00 — [£00/ 
= n(n — 1)p? +np — (np)* = —np? + np 


=np(1 — p). 


3.2.2 Poisson Probability Distribution 


The Poisson probability distribution was introduced by the French mathematician Siméon-Denis 
Poisson in his book published in 1837, which was entitled Recherches sur la probabilité des jugements 
en matiéres criminelles et matiére civile and dealt with the applications of probability theory to lawsuits, 
criminal trials, and the like. Consider a statistical experiment of which A is an event of interest. 
A random variable that counts the number of occurrences of A is called a counting random variable. 
The Poisson random variable is an example of a counting random variable. Here we assume that the 
numbers of occurrences in disjoint intervals are independent and the mean of the number occurrences 
is constant. 
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Definition 3.2.3 A discrete random variable X is said to follow the Poisson probability distribution 
with parameter 4 > O, denoted by Poisson(A), if 
eh yx 
x! 


PX=x)=faAN=f@= 


pe S10) 15.25 223 


The Poisson probability distribution is characterized by the single parameter, 4, which represents the 
mean of a Poisson probability distribution. Thus, in order to specify the Poisson distribution, we only 
need to know the mean number of occurrences. This distribution is of fundamental theoretical and 
practical importance. Rare events are modeled by the Poisson distribution. For example, the Poisson 
probability distribution has been used in the study of telephone systems. The number of incoming 
calls into a telephone exchange during a unit time might be modeled by a Poisson variable assuming 
that the exchange services a large number of customers who call more or less independently. Some 
other problems where Poisson representation can be used are the number of misprints in a book, 
radioactivity counts per unit time, the number of plankton (microscopic plant or animal organisms 
that float in bodies of water) per aliquot of seawater, or count of bacterial colonies per petri plate 
in a microbiological study. In stem cell research, the Poisson distribution is used to analyze the 
redundancy of clusters in the stem cell database. A Poisson probability distribution has the unique 
property that its mean equals its variance. 


MEAN, VARIANCE, AND MOMENT-GENERATING FUNCTION OF A POISSON RANDOM VARIABLE 
Theorem 3.2.2 If X is a Poisson random variable with parameter i, then 


E(X)=iA 
Var(X) = 2. 


Also the moment-generating function is 


My(t) = b@-D, 


The proof of this result is similar to that we used in Theorem 3.2.1 in this section. One needs to use 
the Maclaurin’s expansion, e* = )°°)(A‘/i!). 


&—<— yx erav<3—uvVmKRKn—Xmowowo°*°*°*wSSSSSS ae 
Example 3.2.4 
Let X be a Poisson random variable with A = 1/2. Find 
(a) P(X =0) 
(b) P(X => 3) 


Solution 
(a) We have 


e—!/2(1/2)9 = wie 


P(X = 0) = p0) = —, 


= 0.60653. 
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(b) Here we will use complementary event to compute the required probability. That is, 


P(X > 3)=1- P(X = 2)=1-[p(O) + p01) + pQ)] 


jews ee) x. ee] 


1! 2! 


1 — 0.98561 = 0.01439. 


When n is large and p small, binomial probabilities are often approximated by Poisson probabilities. 
In these situations, where performing the factorial and exponential operations required for direct 
calculation of binomial probabilities is a lengthy and tedious process and tables are not available, 


the Poisson approximation is more feasible. The following theorem states this result. 
| 


POISSON APPROXIMATION TO THE BINOMIAL PROBABILITY DISTRIBUTION 
Theorem 3.2.3 If X is a binomial r.v. with parameters n and p, then for each value x = 0,1, 2,... and as 


Pp — 0,n > oo with np = d constant, 
Cm 
a 


lim (") p*Q - py"* = 


noo 


The proof of this result is similar to that we used in Theorem 3.2.1. In the present context, the Poisson 
probability distribution is sometimes referred to as “the distribution of rare events” because of the 
fact that p is quite small when n is large. Usually, if p < 0.1 andn > 40 we could use the Poisson 
approximation in practice. In general, another rule of thumb is to use Poisson approximation to 
binomial in the case of np < 5. 


: 


Example 3.2.5 

If the probability that an individual suffers an adverse reaction from a particular drug is known to be 0.001, 
determine the probability that out of 2000 individuals, (a) exactly three and (b) more than two individuals 
will suffer an adverse reaction. 


Solution 
Let Y be the number of individuals who suffer an adverse reaction. Then Y is binomial with n = 2000 and 
p = 0.001. Because n is large and p is small, we can use the Poisson approximation with 4 = np = 2. 

(a) The probability that exactly three individuals will suffer an adverse reaction is 


23-2 


3! 


PY =3)= = 0.18. 


That is, there is approximately an 18% chance that exactly three individuals of 2000 will suffer an 
adverse reaction. 
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(b) The probability that more than two individuals will suffer an adverse reaction is 


P(Y > 2) =1-— P(Y =0)— P(Y =1)— P(Y =2) 
=1-—5e 2 = 0.323. 


Similarly, there is approximately a 32.3% chance that more than two individuals will have an adverse 
reaction. 
Es 


Now we will discuss some continuous distributions. As mentioned earlier, if X isa continuous random 
variable with pdf f(x), then 


b 
Pa<X<b)= / f()dx. 


3.2.3 Uniform Probability Distribution 


The uniform probability distribution is used to generate random numbers from other distributions 
and also is useful as a “first guess” if no other information about a random variable X is known, 
other than that it is between a and b. Also, in real-world problems that have uniform behavior in a 
given interval, we can characterize the probabilistic behavior of such a phenomenon by the uniform 
distribution. (See Figure 3.1.) 


Definition 3.2.4 A random variable X is said to have a uniform probability distribution on (a, b), 
denoted by U(a, b), if the density function of X is given by 


1 


——, 45x<b, 
f@®)=4b-a 
0, otherwise. 
The cumulative distribution function is given by 
0, x<a 
x 
1 x—a 
Fa@= dx = 
(x) lx ae a<x<b 
—CO 
1, x>b. 


f(x) = 1/(b— a) 


f(x) =0 


mb----- 


W@ FIGURE3.1 Uniform probability density. 
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| 


Example 3.2.6 
If X is a uniformly distributed random variable over (0, 10), calculate the probability that (a) X < 3, 


(b) X > 6,and(c)3 < X <8. 


Solution 
(a) 
; 1 
Pix <3)= f Zdx= S 
10 10 
0 
(b) 
10 
1 4 
P(X > y= | 5a —. 
10 10 
6 
(c) 
8 
1 1 
P(3<X <8)=] —dx=-. 
10 2 


3 


MEAN, VARIANCE, AND MOMENT-GENERATING FUNCTION OF A UNIFORM RANDOM VARIABLE 
Theorem 3.2.4 If X is a uniformly distributed random variable on (a, b), then 


a+b 
E(x) = : 
(X) 5) 
and 
b—a)? 
Var(X) = ( ) 2 
12 
Also, the moment-generating function is 
elb — ela 


tne Goa 9 o 
1, i =O, 
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Proof. We will obtain the mean and the variance and leave the derivation of the moment-generating 
function as an exercise. By definition we have 


CO 
1 
E(X) = i: dx 
b-a 
—0o 


b b 
| Dg 1 x2 
— xX x= 
b-a b-a De 
a a 
_atb 
=: 
Also 
i 1 1 3? 
E(X2) = | 2@——a= : 
oe) [ee (5 
a a 
_1b-a 
~~ 3 b-a 
1 
= 7? +.ab+a°) asb° — a? = (b— ab? + ab +a”). 
Thus, 


Var(X) = E(X2) — (E(X))? 


(a+b)? 
| 


1 
= 3° + ab +a’) 


1 2 


——___—_:::°::?” nnn — ae eee 
Example 3.2.7 
The melting point, X, of acertain solid may be assumed to be a continuous random variable that is uniformly 
distributed between the temperatures 100°C and 120°C. Find the probability that such a solid will melt 
between 112°C and 115°C. 


Solution 
The probability density function is given by 


1 
—-, 100s 4 = 190 
f(x) = 4 20 


0 otherwise. 
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Hence, 
115 
1 3 
Pd12< X <115)= / —dx = —=0.15. 
20 20 
112 


Thus, there is a 15% chance of this solid melting between 112°C and 115°C. 


3.2.4 Normal Probability Distribution 


The single most important distribution in probability and statistics is the normal probability distri- 
bution. The density function of a normal probability distribution is bell shaped and symmetric about 
the mean. The normal probability distribution was introduced by the French mathematician Abra- 
ham de Moivre in 1733. He used it to approximate probabilities associated with binomial random 
variables when n is large. This was later extended by Laplace to the so-called Central Limit Theorem, 
which is one of the most important results in probability. Carl Friedrich Gauss in 1809 used the nor- 
mal distribution to solve the important statistical problem of combining observations. Because Gauss 
played such a prominent role in determining the usefulness of the normal probability distribution, 
the normal probability distribution is often called the Gaussian distribution. Gauss and Laplace noticed 
that measurement errors tend to follow a bell-shaped curve, a normal probability distribution. Today, 
the normal probability distribution arises repeatedly in diverse areas of applications. For example, in 
biology, it has been observed that the normal probability distribution fits data on the heights and 
weights of human and animal populations, among others. 


We should also mention here that almost all basic statistical inference is based on the normal prob- 
ability distribution. The question that often arises is, when do we know that our data follow the 
normal distribution? To answer this question we have specific statistical procedures that we study 
in later chapters, but at this point we can obtain some constructive indications of whether the data 
follows the normal distribution by using descriptive statistics. That is, ifthe histogram of our data can 
be capped with a bell-shaped curve (Figure 3.2), if the stem-and-leaf diagram is fairly symmetrical 
with respect to its center, and/or by invoking the empirical rule “backwards,” we can obtain a good 
indication whether our data follow the normal probability distribution. 


Definition 3.2.5 A random variable X is said to have a normal probability distribution with parameters 
wand o°, if it has a probability density function given by 


1 


2y5 22 
e OB)" /20 ,-00 <x <00,-0CO <p <ow,o>O0. 
210 


f(x) = 


If 4 = 0, and o = 1, we call it standard normal random variable. 


For any normal random variable with mean yw and variance o7, we use the notation X ~ N(, 0”). 
When a random variable X has a standard normal probability distribution, we will write X ~ N(0, 1) 
(X is a normal with mean O and variance 1). Probabilities for a standard normal probability 
distribution are given in the normal table. 
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0.5 5 
0.4 5 
0.3 5 
0.2 5 


0.1 4 


0.0 


W@ FIGURE3.2 Standard normal density function. 


MEAN, VARIANCE, AND MGF OF A NORMAL RANDOM VARIABLE 
Theorem 3.2.5 If X ~ N(u, o2), then E(X) = jx and Var(X) = o2. Also the moment-generating function is 


il oY 92 
Mx(t) = eae o 


If X ~ N(w, o), then the z-transform (or z-score) of X, Z = ae is an N(O, 1) random variable. This 
fact will be used in calculating probabilities for normal random variables. 


1 nn 
Example 3.2.8 
(a) For X ~ N(0, 1), calculate P(Z > 1.13). 
(b) For X ~ N(5, 4), calculate P(—2.5 < X < 10). 


Solution 
(a) Using the normal table, 


P(Z > 1.13) = 1 — 0.8708 = 0.1292. 


The shaded part in the graph represents the P(Z > 1.13). 


0.55 0.55 
0.4 0.45 
0.3 4 0.3 4 
0.2 5 0.2 5 
0.1 5 0.15 
0.0 1 0.0 1 


1.13 
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(b) Using the z-transform, we have 


—2.5— 10 — 
P25 <X< 10) =P(—3— <z< *) 


= P(-—3.75 < Z < 2.5) 
= P(-—3.75 < Z < 0)+ P(O < Z < 2.5) 
= 0.9938. 
= 


In the following example, we will show how to find the z values when the probabilities are given. 


eK ._O  aarorvrcvVC13COoonrmrvcasaent 
Example 3.2.9 
For a standard normal random variable Z, find the value of zg such that 
(a) P(Z > zo) = 0.25. 
(b) P(Z < zo) =0.95. 
(c) P(Z < zo) = 0.12. 
(d) P(Z > zo) = 0.68. 


Solution 
(a) From the normal table, and using the fact that the shaded area in the figure is 0.25, we obtain 
zo © 0.675. 
(b) Because P(Z < zo) = 1— P(Z = zo) = 0.95 = 0.54+ 0.45. This implies, P(Z > zo) = 0.05. From 
the normal table, zo = 1.645. 


0.5 
0.4 
0.3 
0.2 
0.1 
0.0 T T T 


(c) From the normal table, zg = —1.175. 
(d) Using the normal table, we have P(Z > zo) = 0.5 + P(O < Z < zg) = 0.68. 
This implies, P(Z < zo) =0.32. From the normal table, zo = — 0.465. 


Example 3.2.10 
The scores of an examination are assumed to be normally distributed with ps = 75 and 0 = 64. What is the 
probability that a score chosen at random will be greater than 85? 
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Solution 
Let X be a randomly chosen score from the exam scores. Then, X ~ N(75, 64). 


x= = 
P(X > 85) = ( ican a 1.25) 


> 
8 8 
= P(Z > 1.25) = 0.1056. 


0.5 
0.4 
0.3 
0.2 
0.1 


0.0 T T T 
1.25 


Thus, there is about a 10.56% chance that the score will be greater than 85. 
= 


In practice, whenever a large number of small effects are present and acting additively, it is reasonable 
to assume that observations will be normal. When the number of data is small, it is risky to assume 
a normal distribution without a proper testing. Apart from histogram, box-plot, and stem-and-leaf- 
displays, one of the most useful tools for assessing normality is a quantile quantile or QQ plot. This is 
a scatterplot with the quantiles of the scores on the horizontal axis and the expected normal scores on 
the vertical axis. The expected normal scores are calculated by taking the z-scores of (rj —0.5)/n, where 
r; is the rank ith observation in increasing order. The steps in constructing a QQ plot are as follows: 
First, we sort the data in an ascending order. If the plot of these scores against the expected normal 
scores is a straight line, then the data can be considered normal. Any curvature of the points indicates 
departures from normality. This procedure obtaining a normal plot (QQ plot is similar to normal 
plot for a normal distribution) is described in Project 4C. Figure 3.3 shows a normal probability plot 
generated by Minitab. 


If plotted points do not fit the line well, but bend away from it in places, the distribution may be 
nonnormal. The shapes in Figure 3.4 will give some indication of the distribution of the data. 


0.999 
0.99 5 
0.95 7 
0.80 5 
0.50 
0.20 5 
0.05 7 
0.01 7 

0.001 5 


T T T 
—1.5 =1.0 -0.5 0.0 0.5 


W@ FIGURE 3.3 Normal probability plot. 
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If the layout of points appears to bend 
up and to the left of the normal line 


| If the layout of points bends down and 
‘to the right of the normal line that 
' indicates a long tail to the left, or left 
i skew. 


pesesese 


' An S-shaped layout of points indicates 
shorter than normal tails, thus, a 
| smaller variance is expected. 


' If the layout of points starts below the 
‘normal line, bends to follow it, and 
| ends above it, this will indicate long tails. 
| That is, there is more variance than we 
would expect in a normal distribution. 


segcceneceseses. 


Wi FIGURE 3.4 Shapes indicating distribution of the data. 


Almost all of the statistical software packages include a procedure for obtaining the graph of a normal 
probability plot that can be used to test the normality of a data. A discussion of how to do this is 
given in Section 14.4. Errors in the measurements can also act in a multiplicative (rather than additive) 
manner. In that case, the assumption of normality is not justified. 


A closely related distribution to normal distribution is the log-normal distribution. A variable might 
be modeled as log-normal ifit can be thought of as the multiplicative effect of many small independent 
factors. This distribution arises in physical problems when the domain of the variate, X, is greater 
than zero and its histogram is markedly skewed. If a random variable Y is normally distributed, 
then exp(Y) has a log-normal distribution. Thus, the natural logarithm of a log-normally distributed 
variable is normally distributed. That is, if X is a random variable with log-normal distribution, then 
In(X) is normally distributed. Most biological evidence suggests that the growth processes of living 
tissue proceed by multiplicative, not additive, increments. Thus, the measures of body size should at 
most follow a log-normal rather than normal distribution. Also, the sizes of plants and animals is 
approximately log-normal. The log-normal distribution is also useful in modeling of claim sizes in 
the insurance industry. 


The probability density function of a log-normal random variable, X, is given as 


1 e fin X—Hy)?/20y | 


FQ) = 4 Fey 28 


x > 0,0y > 0, —00 < fy < © 


0, otherwise. 
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where jy and oy are the mean and standard deviation of Y = In(X). These parameters are related to 
the parameters of the random variable X as follows: 


4 2462 
fy =In ae x) %W= In Me 7] * |. 
Mx + OX Mx 


We can verify that the expected value X is 


B(x) = et ORD 

and the variance is 
2 2 
Var(X) = (€77 = 1)e7# ty, 


The question of when the log-normal distribution is applicable in a given physical problem after a 
certain amount of data has been obtained can be answered by creating a normal probability plot of 
In(X) and testing for normality. Thus, ifthe natural logarithms of the data show normality, log-normal 
distribution may be more appropriate. 


If X is log-normally distributed with parameters jz, and oy, and 0 < a < b, then with Y = In(X) 


P(a< X <b) = P(lna< Y < Inb) 


= P(e 2 Y — py 2 tats) 


Oy oy Oy 
=P@<Z<b’), 
where Z ~ N(0, 1). This probability can be obtained from the standard normal table. 


—::0::?:?°XY.0D Raa) =_=_—_—————_:7.c““ 

Example 3.2.11 
In an effort to establish a suitable height for the controls of a moving vehicle, information was gathered 
about X, the amounts by which the heights of the operators vary from 60 inches, which is the minimum 
height. It was verified that the data that were collected followed the log-normal distribution by normal 
probability plot of Y = In X. Assume that jz, = 6in. and ox = 2in. 

(a) What percentage of operators would have a height less than 65.5 in.? 

(b) If an operator is chosen at random, what is the probability that his or her height will be between 

64 and 66 in.? 


Solution 
(a) Here, X = 65.5—60=5.5. Also, 
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Thus, 


P(X <5.5) = P(Y <1n5.5) = (2 < a) 
0.053 


= P(Z < —0.67) = 0.2514. 
Hence, about 25.14% of the heights of the operators vary from 60 inches. 
(b) Similar to part (a), we get 
P(4< X <6)= P(n4<Y <In6) 


_ (4 — 1.74 ae (In 6) — 174) 


0.053 0.053 
= P(—6.67 < Z < 0.98) = 0.8365. 


3.2.5 Gamma Probability Distribution 


The gamma probability distribution has found applications in various fields. For example, in engi- 
neering, the gamma probability distribution has been employed in the study of system reliability. We 
describe the gamma function before we introduce the gamma probability distribution. The gamma 
function, denoted by (a), is defined as 


Ce 
T'(a) = / e*x*-ldx, a > 0. 
0 


It can be shown using the integration by parts that fora > 1, '(a) = (a— 1)I'(a— 1). In particular, if 
n is a positive integer, (mn) = (n — 1)!. 


Definition 3.2.6 A random variable X is said to possess a gamma probability distribution with 
parameters a > 0 and B > 0 if it has the pdf given by 


fa) = [rs ee 


0, otherwise. 


The gamma density has two parameters, a and f. We denote this by Gamma(a, f). The parameter 
a is called a shape parameter, and f is called a scale parameter. Changing w changes the shape of the 
density, whereas varying 6 corresponds to changing the units of measurement (such as changing from 
seconds to minutes). Varying these two parameters will generate different members of the gamma 
family. If we take a to be a positive integer, we get a special case of gamma probability distribution, 
known as the Erlang distribution. This is used extensively in queuing theory to model waiting times. 
Figure 3.5 gives an indication of how a and £ influence the shape and scale of f(x). 
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Gamma pdfs for (2, 3), (3, 1), (4, 3), and (2, 4) 
0.3 T r T T 


Gam(3, 1) 


0.25 


Gam(4, 3) 


0 5 10 15 20 25 


W@ FIGURE 3.5 Gamma pdfs for different degrees of freedom. 


MEAN, VARIANCE, AND MGF OF A GAMMA RANDOM VARIABLE 
Theorem 3.2.6 If X is a gamma random variable with parameters « > 0 and B > 0, then 


E(X)=af and Var(X) = af. 


Also, the moment-generating function is 


1 
Mx(t) = Ga fn t< B 


Oooo -:::.:.— nn EES 
Example 3.2.12 
The daily consumption of aviation fuel in millions of gallons at a certain airport can be treated as a gamma 
random variable with a = 3, 6 = 1. 
(a) What is the probability that on a given day the fuel consumption will be less than 1 million gallons? 
(b) Suppose the airport can store only 2 million gallons of fuel. What is the probability that the fuel 
supply will be inadequate on a given day? 


Solution 
a) Let e the fuel consumption in millions of gallons on a given day at a certain airport. Then, 
Let X be the fuel ion in milli f gall iven di in ai Th 
X ~T(a@= 3,8 = 1) and 
1 1 
f(x) = xl e* et 2°-x 
T(3)(13) 2 
Hence, using integration by parts, we obtain 
1 
1 2 .=% 5 
P(X <1)=— | x°e “dx =1-— — = 0.08025. 
2 2e 
0 
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0.30 5 


0.25 5 


0.205 


0.15 5 


0.105 


0.05 + 


0.00 


1 


Thus, there is about an 8% chance that on a given day the fuel consumption will be less than 
1 million gallons. 

(b) Because the airport can store only 2 million gallons, the fuel supply will be inadequate if the fuel 
consumption X is greater than 2. Thus, 


Ce 
P(X >2)= 5 f eras = 0.677. 
2 
0.30 7 
0.25 | 


0.20 | 


0.15 { 


0.10 { 


0.05 { 


0.00 


2 


We can conclude that there is about a 67.7% chance that the fuel supply of 2 million gallons will be 
inadequate on a given day. So, if the model is right, the airport needs to store more than 2 million 
gallons of fuel. 
= 
We now describe two special cases of gamma probability distribution. In the pdf of the gamma, we 
let a = 1, we get the pdf of an exponential random variable. 


Definition 3.2.7 A random variable X is said to have an exponential probability distribution with 
parameter B if the pdf of X is given by 

1 

<7 4/8, B>0;0<x<oo 


f= 48 


0, otherwise. 


Exponential random variables are often used to model the lifetimes of electronic components such 
as fuses, for survival analysis, and for reliability analysis, among others. The exponential distribution 
(Figure 3.6) is also used in developing models of insurance risks. 
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Exponential (3) 
0.35 T T T 


0.25 


W@ FIGURE3.6 Probability density function for exponential r.v. 


MEAN, VARIANCE, AND MGF OF AN EXPONENTIAL RANDOM VARIABLE 
Theorem 3.2.7 If X is an exponential random variable with parameters B > 0, then 


E(X)=B and Var(X) = B?. 


Also the moment-generating function is 


1 
Mx(t) = G=69’ t< B 


——— Or 


Example 3.2.13 
The time, in hours, during which an electrical generator is operational is a random variable that follows 
the exponential distribution with 6 = 160. What is the probability that a generator of this type will be 
operational for 

(a) Less than 40 hours? 

(b) Between 60 and 160 hours? 

(c) More than 200 hours? 


Solution 
Let X denote the random variable corresponding to time (in hours) during which the generator is operational. 
Then the density function of X is given by 
1 -(e) ‘ x>0 
f(x) = 4 160 


0, otherwise. 
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Thus, we have the following: 
(a) P(X < 40) = - rege OO dx = 0.22119. There is about a 22.1% chance that a generator of 
this type will be operational for less than 40 hours. 


(b) P(60 < X < 160) = be Tepe 1/16) dx = 0.3194. Hence, there is about a 31.94% chance that 


a generator of this type will be operational between 60 and 160 hours. 
(c) P(X > 200) = [39 pepe C/O dx = 0.2865. The chance that the generator will last more than 
200 hours is about 28.65%. 
| 


Another special case of gamma probability distribution that is useful in statistical inference problems 
is the chi-square distribution. 


Definition 3.2.8 Let n be a positive integer. A random variable, X, is said to have a chi-square (x*) 
distribution with n degrees of freedom if and only if X is a gamma random variable with parameters 
at = n/2 and B = 2. We denote this by X ~ x?(n). 


Hence, the probability density function of a chi-square distribution with n degrees of freedom is 


given by 


i (n/2)-1g-x/2, gi 
x e F <x< © 
fx) = 4 (5) 2/2 
0, otherwise. 
Figure 3.7 illustrates the dependence of the chi-square distribution on n. 


The mean and variance of a chi-square random variable follow directly from Theorem 3.2.6. 


Chi-square densities, n= 2, 3, 4, and5 
0.5 T T T if if 


0.45 


0.35 


0.25 
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iN 
oO 
foo} 


10 12 


W@ FIGURE3.7 Chi-square pdfs for different degrees of freedom. 
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MEAN, VARIANCE, AND MGF OF A CHI-SQUARE RANDOM VARIABLE 
Theorem 3.2.8 If X is a chi-square random variable with n degrees of freedom, then E(X) =n and 
Var(X) = 2n. Also, the moment-generating function is given by 


1 
Mx® = Go pnf2? ae 


Another class of distributions that plays a crucial role in Bayesian statistics (see Chapter 11) is the 
beta distribution. The beta distribution is used as a prior distribution for binomial or geometric 
proportions. A random variable X is said to have a beta distribution with parameters a and £B if and 
only if the density function of X is 


a—1¢1_+)\B-1 
popu | BEB % b> 00ers] 
0, otherwise, 


where B(a, B) = i, x*-1(1 — x) lady. It can be proved (see Exercise 3.2.31) that B(a, B) = {2 


P@+B) ’ 
eee) = _ FP 
and that E(X) = atB and Var (X) ~ (a+B)2(a+B+1) ° 


One of the questions we may have is: “How do we know which distribution to use in a given physical 
problem?” There is no simple and direct answer to this question. One intuitive way is to construct 
a histogram from the information at hand; from the shape of this histogram, we decide whether 
the random variable follows a particular distribution such as gamma distribution. Once we decide 
that it follows a particular distribution, then the parameters of this distribution, such as w and 8, 
must be statistically estimated. In Chapter 5, we discuss how to estimate these parameters. Then a 
goodness-of-fit test can be performed to see whether the distribution model seems to be the right one. 


EXERCISES 3.2 


3.2.1. A fair coin is tossed 10 times. Let X denote the number of heads obtained. Find the following. 
(a) P(X =7) 
(b) P(X <7) 
(c) P(X > 0) 
(d) E(X) and Var(X) 


3.2.2. Let X bea Poisson random variable with 1 = 1/3. Find 
(a) P(X =0) 
(b) P(X = 4). 


3.2.3. Fora standard normal random variable Z, find the value of zo such that 
(a) P(Z > zo) = 0.05 
(b) P(Z < zo) = 0.88 
(c) P(Z < zo) = 0.10 
(d) P(Z > zo) = 0.95. 
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3.2.4. Let X ~ N(12, 5). Find the value of xo such that 
(a) P(X > x9) = 0.05 
(b) P(X < x9) = 0.98 
(c) P(X < x9) = 0.20 
(d) P(X > xo) = 0.90. 


3.2.5. Let X ~ N(10, 25). Compute 
(a) P(X < 20) 
(b) P(X > 5) 
(c) P(12 < X < 15) 
(d) P(|X — 12| < 15). 


3.2.6. A quarterback on a football team has a pass completion rate of 0.62. If, in a given game, he 
attempts 16 passes, what is the probability that he will complete 
(a) 12 passes? 
(b) More than half of his passes? 
(c) Interpret your result. 
(d) Out of the 16 passes, what is the expected number of completions? 


3.2.7. Aconsulting group believes that 70% of the people in a certain county are satisfied with their 
health coverage. Assuming that this is true, find the probability that in a random sample of 
15 people from the county: 
(a) Exactly 10 are satisfied with their health coverage, and interpret. 
(b) Not more than 10 are satisfied with their health coverage, and interpret. 
(c) What is the expected number of people out of 15 that are satisfied with their health 
coverage? 


3.2.8. Aman fires at a target six times; the probability of his hitting it each time is independent of 
other tries and is 0.40. 
(a) What is the probability that he will hit at least once? 
(b) How many times must he fire at the target so that the probability of hitting it at least 
once is greater than 0.77? 
(c) Interpret your findings. 


3.2.9. A certain electronics company produces a particular type of vacuum tube. It has been 
observed that, on the average, three tubes of 100 are defective. The company packs the 
tubes in boxes of 400. What is the probability that a certain box of 400 tubes will contain 
(a) r defective tubes? 

(b) At least k defective tubes? 
(c) At most one defective tube? 
(d) Interpret your answers to (a), (b), and (c). 


3.2.10. Suppose that, on average, in every two pages of a book there is one typographical error, and 
that the number of typographical errors on a single page of the book is a Poisson r.v. with 
A = 1/2. What is the probability of at least one error on a certain page of the book? Interpret 
your result. 
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3.2.11. Show that the probabilities assigned by Poisson probability distribution satisfy the 
requirements that 0 < p(x) <1 forall x and }°, p(x) = 1. 


3.2.12. In determining the range of an acoustic source using the triangulation method, the time at 
which the spherical wave front arrives at a receiving sensor must be measured accurately. 
Measurement errors in these times can be modeled as possessing uniform probability dis- 
tribution from —0.05 to 0.05 microseconds. What is the probability that a particular arrival 
time measurement will be in error by less than 0.01 microsecond? What does your answer 
mean? 


3.2.13. The hardness of a piece of ceramic is proportional to the firing time. Assume that a rating 
system has been devised to rate the hardness of a ceramic piece and that this measure of 
hardness is a random variable that is distributed uniformly between 0 and 10. If a hardness 
in [5,9] is desirable for kitchenware, what is the probability that a piece chosen at random 
will be suitable for kitchen use? 


3.2.14. A receiver receives a string of 0s and 1s transmitted from a certain source. The receiver used 
a majority rule. That is, if the receiver acquires five symbols, of which three or more are 1s, it 
decides that a 1 was transmitted. The receiver is correct only 85% of the time. What is P(W), 
the probability of a wrong decision if the probabilities of receiving Os and 1s are equally 
likely? What can you conclude from your result? 


3.2.15. The efficiency X of a certain electrical component may be assumed to be a random variable 
that is distributed uniformly between 0 and 100 units. What is the probability that X is: 
(a) Between 60 and 80 units? 
(b) Greater than 90 units? 
(c) Interpret (a) and (b). 


3.2.16. The reliability function of a system or a piece of equipment at time ¢ is defined by 
R(t) = P(T > t) = 1— F(t) 


where T, the failure time, is a random variable with a known distribution. A certain vacuum 
tube has been observed to fail uniformly over the interval [f, t2]. 
(a) Determine the reliability of such a tube at time t,t) < t < bho. 
(b) If 180 < t < 220, what is the reliability of such a tube at 200 hours? 
(c) The failure or hazard rate function p(t) is defined by 
fo fit) — — 28 
1 — F(t) R(t) R(t) 


p(t) = 


Calculate the failure rate of this vacuum tube. Interpret your result. 


3.2.17. An electrical component was studied in the laboratory, and it was determined that its failure 
rate was approximately equal to 3 = 0.05. What is the reliability of such a component at 
10 hours? 
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3.2.18. Suppose that the life length of a mechanical component is normally distributed. 
(a) If o=3 and «= 100, find the reliability of such a system at 105 hours. 
(b) What should be the expected life of the component if it has reliability of 0.90 for 120 
hours? 


3.2.19. A geologist defines granite as a rock containing quartz, feldspar, and small amounts of 
other minerals, provided that it contains not more than 75% quartz. If all the percentages 
are equally likely, what proportion of granite samples that the geologist collects during his 
lifetime will contain from 50% to 65% quartz? 


3.2.20. For a normal random variable with pdf, 


1 2 ja a2 
f@=——e F [20° 49 <x <00 
J 210 


show that f°. f(x)dx = 1. [Hint: use polar coordinates. | 


3.2.21. A professor in a large statistics class has a grading policy such that only the 15% of the 
students with the highest scores will receive the grade A. The mean score for this class is 72 
with a standard deviation of 6. Assuming that all the grades for this class follow a normal 
probability distribution, what is the minimum score that a student in this class has to get 
to receive an A grade? 


3.2.22. The scores, X, of an examination may be assumed to be normally distributed with = 70 
and o* = 49. What is the probability that: 
(a) Ascore chosen at random will be between 80 and 85? 
(b) Ascore will be greater than 75? 
(c) Ascore will be less than 90? 
(d) Interpret the meaning of (a), (b), and (c). 


3.2.23. Suppose that the diameters of golf balls manufactured by a certain company are normally 
distributed with » = 1.96in. and o = 0.04 in. A golf ball will be considered defective if 
its diameter is less than 1.90 in. or greater than 2.02 in. What is the percentage of defective 
balls manufactured by the company? What did the answer indicate? 


3.2.24. Suppose that the arterial diastolic blood pressure readings in a population follow a normal 
probability distribution with mean 80 mm Hg and standard deviation 6.2 mm Hg. Suppose 
it is recommended that a physician be consulted if an individual has an arterial diastolic 
blood pressure reading of 90 mm Hg or more. If an individual is randomly picked from 
this population, what is the probability that this individual needs to consult a physician? 
Discuss the meaning of your result. 


3.2.25. In acertain pediatric population, systolic blood pressure is normally distributed with mean 
115 mm Hg and standard deviation 10 mm Hg. Find the probability that a randomly selected 
child from this population will have: 

(a) A systolic pressure greater than 125 mm Hg. 
(b) A systolic pressure less than 95 mm Hg. 
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(c) A systolic pressure below which 95% of this population lies. 
(d) Interpret (a), (b), and (c). 


3.2.26. A physical fitness test was given to a large number of college freshmen. In part of the test, 
each student was asked to run as far as he or she could in 10 minutes. The distance each 
student ran in miles was recorded and can be considered to be a random variable, say X. 
The data showed that the random variable X followed the log-normal distribution with 
fly = 0.35 and oy = 0.5, where Y = In X. A student is considered physically fit if he or she 
is able to run 1.5 miles in the time allowed. What percentage of the college freshmen would 
be considered physically fit if we consider only this part of the test? 


3.2.27. An experimenter is designing an experiment to test tetanus toxoid in guinea pigs. The survival 
of the animal following the dose of the toxoid is a random phenomenon. Past experience 
has shown that the random variable that describes such a situation follows the log-normal 
distribution with zy = 0 and oy = 0.65. As a requirement of good design the experimenter 
must choose doses at which the probability of surviving is 0.20, 0.50, and 0.80. What three 
doses should he choose? 


3.2.28. Show that (1) = 1 and fora > 1, (a) = (a—1)F(a— 1). 


3.2.29. (a) Find the moment-generating function for a gamma probability distribution with 
parameter a > 0 and f > O. [Hint: In the integral representation of E(e’*), change 
the variable t to u = (1 — Bt)x/B, with (1 — Br) > 0.] 
(b) Using the mgf of a gamma probability distribution, find E(X) and Var(X). 


3.2.30. Let X be an exponential random variable. Show that, for numbers a > 0 and b > 0, 
P(X >a+b|X >a)= P(X > Db). 


(This property of the exponential distribution is called the memoryless property of the 
distribution.) 


3.2.31. A random variable X is said to have a beta distribution with parameters a and £ if and only 
if the density function of X is 


x21 (y—x)h-1 
—pa py «6 BS O;0<x<1 
f= me 
0, otherwise 


where B(a, B) = i. xe—-1q — x)Poldx, 
(a) Show that B(a@, B) = eee 


(b) Show that E(X) = ath and Var(X) = 


op 
(a+)? (+B+1)° 
3.2.32. The daily proportion of major automobile accidents across the United States can be treated 
as a random variable having a beta distribution with a = 6 and 6 = 4. Find the probability 
that, on a certain day, the percentage of major accidents is less than 80% but greater than 
60%. Interpret your answer. 
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3.2.33. Suppose that network breakdowns occur randomly and independently of each other on an 

average rate of three per month. 

(a) Whatis the probability that there will be just one network breakdown during December? 
Interpret. 

(b) What is the probability that there will be at least four network breakdowns during 
December? Interpret. 

(c) What is the probability that there will be at most seven network breakdowns during 
December? Interpret. 


3.2.34. Let X be a random variable denoting the number of events occurring in the time interval 
(0, t]. Show that X has a gamma probability distribution with parameters n and 4. 


3.2.35. In order to etch an aluminum tray successfully, the pH of the acid solution used must be 
between 1 and 4. This acid solution is made by mixing a fixed quantity of etching compound 
in powder form with a given volume of water. The actual pH of the solution obtained by 
this method is affected by the potency of the etching compound, by slight variations in the 
volume of water used, and perhaps by the pH of the water. Thus, the pH of the solution 
varies. Assume that the random variable that describes the random phenomenon is gamma 
distributed with a = 2 and B= 1. 

(a) What is the probability that an acid solution made by the foregoing procedure will 
satisfactorily etch a tray? 
(b) What would the answer to part (a) be if@ = 1 and f = 2? 


3.3 JOINT PROBABILITY DISTRIBUTIONS 


We have thus far confined ourselves to studying one-dimensional or univariate random variables and 
their properties. In many practical situations, we are required to deal with several, not necessarily 
independent random variables. For example, we might be interested in a study involving the weights 
and heights (W, H) of a certain group of persons. In this situation, we need the two random variables 
(W, H), and it is likely that these two are related. Then it becomes important to study the joint effect of 
these random variables, which will lead to finding the joint probability distributions. In this section, 
we confine our studies to two random variables and their joint distributions, which are called bivariate 
distributions. We consider the random variables to be either both discrete or both continuous. We now 
define joint distribution of two random variables. 


Definition 3.3.1 (a) Let X and Y be random variables. If both X and Y are discrete, then 
f(x, y) = P(X=x,Y=y) 
is called the joint probability function (joint pmf) of X and Y. 


(b) If both X and Y are continuous then f(x, y) is called the joint probability density function (joint 
pdf) of X and Y if and only if 


bd 
PasXxehesved=| { fx, »drdy, 
ae 
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2.00 ..:?:>:°0—060°0 SSS. TT... OO 
Example 3.3.1 
A probability class contains 10 African American, 8 Hispanic American, and 15 white students. If 12 
students are randomly selected from this class, and if X¥ = number of black students, and Y = number of 
white students, find the joint probability function of the bivariate random variable (X, Y). 


Solution 
There are a total of 33 students. The number of ways in which x African American, and y white students can 
be picked (which means, the remaining 12 — (x + y) students are Hispanic American) can be obtained using 


the multiplication principle as 
10\ /15 8 
x x 12-—x-y 


33 
The number of ways to pick 12 students from 33 students is ( ) Hence, the joint probability function is 
12 


10\ /15 8 
x y 12-—x-y 
33 
12 
whereO <x < 10,0< y<12,and4 < x+y < 12. The last constraint is needed because there are only 


eight Hispanic Americans, so the combined minimum number of whites and African Americans should be at 
least 4. 


P(X =x,Y=y)= 


= 
We follow the notation: )°, ,, to denote }’, >’, The joint distribution of two random variables has 


to satisfy the following conditions. 


Theorem 3.3.1 If X and Y are two random variables with joint probability function f (x, y), then 


1. f(x, y) = 0 for all x and y. 
2. If X and Y are discrete, then pe f(x, y)=1, 


where the sum is over all values (x, y) that are assigned nonzero probabilities. If X and Y are continuous, then 
CO CC 
/ / f(@, y) = 1. 
—00 —CO 


Given the joint probability distribution (pdf or pmf), the probability distribution function of a 
component random variable can be obtained through the marginals. 
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Definition 3.3.2 The marginal pmf of X denoted by fx(x) (or f(x), when there is no confusion) is defi- 
ned by 


CO 
f f(x, y)dy, if X and Y are continuous, 


fx@) = 4% 
dX f(y), if X and Y are discrete. 
ally 


Similarly, the marginal pdf of Y is defined by 


Cc 
| f(x, y)dx, if X and Y are continuous, 


fyQ) = 47% 
fy), if X and Y are discrete. 


all x 


Note that 


b 
x)dx, if X and Y are continuous, 
Pa<X<b)= J fx@) f 


fx), if X and Y are discrete, 
where summation is over all values of X from a to b. 


a 


Example 3.3.2 
Find the marginal probability density function of the random variables X and Y, if their joint probability 


function is given by Table 3.1. 


Table 3.1 


Find the marginal densities of X and Y. 


Solution 
By definition, the marginal pdfs of X are given by the column sums (summands over y for fixed x), and the 


marginal pdfs of Y are obtained by the row sums. Hence, 


Xj —1 3 5. otherwise yj —2 0 1. 4. otherwise 
fx(xj) 0.5 0.4 0.1 0 fyQyi) 0.4 03 0.1 0.2 0 
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Using the joint probability distribution and the marginals, we can now introduce the conditional 
probability distribution function. 


Definition 3.3.3 The conditional probability distribution of the random variable X given Y is given by 


faly) = f@|¥ =y) 


IO, y if X and Y are continuous, fy(y) # 0, 
_ fy) 
=) pry _ 
HS e Y=): if X and Y are discrete. 
fry) 


We note that both the marginal probability densities of X and Y as well as the conditional pdf must 
satisfy the two important conditions of a pdf. 


We know that two events A and B are independent if P(AM B) = P(A) P(B). It is usually more conve- 
nient to establish independence through the probability functions. Hence, we define independence 
for bivariate probability distribution as follows. 


Definition 3.3.4 Let X and Y have a joint pmf or pdf f(x, y). Then X and Y are independent if and 
only if 


f@.y) = fx@ fy), forall x and y. 
That is, for independent random variables, the joint pdf is the product of the marginals. 


3 
Example 3.3.3 
Let 


3x, Ox<y<x<l, 


f(x,y) = | ; 
0, otherwise. 

(a) Find P(x <},4<¥ <}). 

(b) Find the marginals fx (x) and fy(y). 

(c) Find the conditional f(x |y)(0 < y < 1). Also compute f (xiv = 5). 

(d) Are X and Y independent? 


Solution 
(a) The domain of the function f(xy) is given in Figure 3.8. The required probability 
P (x < 7 i <Y< 3) is the volume over the area of the shaded region as shown by Figure 3.9. 
That is, 


1/2 x 
P(xshg<r<3)= | ff sxdas 
1/41/4 
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1/2 
= (= =) / 
3 8 1/4 
_ 5 
~ 128° 
y 
Wi iesretae eo f(x, y)=3xin 
this region 
x 
1 
W FIGURE3.8 Domain of f(x, y). 
y a 
1 y=x 
The region 0<x< 1/2 
and 1/44<y<3/4 
T 1X 


0.0 0.2 0.4 0.6 0.8 1.0 1.2 


W@ FIGURE3.9 Region of integration. 


(b) To find the marginals, we note that for each x, y varies from 0 to x(O < y < x). Therefore 


x 
fx) = [ow = 3x (915) = 3x7, O<x<l. 

0 

Similarly, for each y, x varies from y to 1. 
1 1 
3x? 
fy Q) = | 3xdx = — 
x 


3 - 3 


ae 


y 


3 
=5(1-»), O<y<l. 
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(c) Using the definition of conditional density 


(x, y) 3x 2x 
fielyy= ae ria zy YSxsi. 
fry) 3a-y2) 1l-y 
From this we have 
2 8 
f(xy =4) => =a dss 


(d) To check for independence of X and Y 
fx fy (5) = @ (3) = ¥ 43 =F (14). 
Hence, X and Y are not independent. 
= 


Recall that in the case of a univariate random variable X, with probability function f(x), we have 


> xf (x), if }° |x| f(x) < ov, for discrete rv. 
EX = x x 
fxfQodx, if f |x| f(x)dx < 00, for continuous rv. 


Now we define similar concepts for bivariate distribution. 


Definition 3.3.5 Let f(x,y) be the joint probability function, and let g(x,y) be such that 
Yxyle@, VIC, y) < 00 in the discrete case, or I £oe5 le. DIF, y)dxdy < 00, in the continuous 
case. Then the expected value of g(X, Y) is given by 


D gt, WG, y), if X,Y are discrete, 
x,y 
EgX(X,Y) =} © © 
It ff g@. yf, y)dxdy, if X,Y are continuous. 
—0O —0O 
In particular 
d xyf (x, y), if X,Y are discrete, 
x,y 


E(X,Y)=4 6 
lf ff xyf(, y)dxdy, if X,Y are continuous. 


—0O0 —00 


The following properties of mathematical expectation are easy to verify. 


PROPERTIES OF EXPECTED VALUE 
1. E(aX + bY) = aE(X) + bE(Y). 
2. If X and Y are independent, then E(XY) = E(X)E(Y). However, the converse is not necessarily true. 
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| 
Example 3.3.4 
Let f(x, y)=3x, O<y<x<1. 
(a) Find E(4X — 3Y), 
(b) Find E(XY). 
Solution 
(a) E(X) = f xfx(x)dx and E(Y) = f yfy(y)dy. 
Recall that earlier (Example 3.3.3) we have computed fx(x) = 3x2 (0 < x < 1) and fy(y) = 
3(1 — y*), 0 < y < 1. Using these results, we have 
1 
2 3 
E(X) = [sx dx = re 


0 
i 


3 3 
E(Y) = / y5(—y?ydy = 5. 


0 
Hence, 
E(4X —3y)=3-2= 20. 
8 8 
(b) 
1x 
3 

E(XY)= 3x)dydx = —. 

(XY) [ [ xendva8 10 = 
00 


Conditional expectations are defined in the same way as univariate expectations, except that the 
conditional density is utilized in place of the unconditional density function. 


Definition 3.3.6 Let X and Y be jointly distributed with pf or pdf f(x, y). Let g be a function of x. Then 
the conditional expectation of g(x) given, Y = y is 


E(g(X) ly) = E(g(X) |Y = y) 
| YS g(x f(xly), if X,Y are discrete, 
= all x 


fe@faly)dx, if X,Y are continuous. 


Note that E(g(X) |y) is a function of y. If we let Y range over all of its possible values, the conditional 
expectation E(g(X)|Y) can be thought of as a function of the random variable Y. We will then be 
able to find the mean and variance of E(g(X) |Y ), as given in the following result, the proof of which 
is left as an exercise. 


Theorem 3.3.2 Let X and Y be two random variables. Then 
(a) E(X) = E[E(X|Y)). 
(b) Var (X) = E[Var(X|Y)] + Var[E(X|Y)]. 
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Lm 
Example 3.3.5 
Let X and Y be two random variables with joint density function given by 


xe+%,  O<x<landO<y<2 
F(X, y) = 


otherwise. 


Find the conditional expectation, E (xiv = 5). 


Solution 
First we will find the conditional density, f(x |y). The marginal 


1 
1 1 
froy= f (2+ D)dr=F4 2 O<y<2. 


3 3° 6 
0 
Therefore, 
fay r+? 
FO = Fy = 7 +, O<x<1 
y Eta 
Hence, 
wad ae 12 24 
F(x Se rare = aaa 
Thus, 
1 
E(xIY = 7) = [foray 
0 
r 12 u 
_ 2 a. 
=f (x i )ar= pe = 00733. 
0 


3.3.1 Covariance and Correlation 


We will now define the covariance and correlation coefficient of two random variables. 


Definition 3.3.7 (i) The covariance between two random variables X and Y is defined by 
oxy = Cov(X, Y) = E(X — wx) (Y — wy) = E(XY) — wxny, 


where wx = E(X) and py = E(Y). 
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(ii) The correlation coefficient, o = p(x, y) is defined by 
_ Cov(X, Y) 
= [War Vary) 


Correlation is the measure of the linear relationship between the random variables X and Y. If Y = aX+b(a # 
0), then p(x, y) = 1. If dependence on X and Y needs to be specified, we will use the notation, pxy. 


From the definition of the covariance of X and Y, we note that if small values of X, for which 
(X — wx) <0, tend to be associated with small values of Y, for which (Y — wy) <0, and similarly 
large values of X with large values of Y, then Cov(X, Y) = E[(X — wx)(Y — “y)] can be expected to be 
positive. On the other hand, if small values of X tend to be associated with large values of Y and vice 
versa so that (X — x) and (Y — py) are of opposite signs, then Cov(X, Y) < 0. Thus, covariance can 
be thought of as a signed measure of the variation of Y relative to X. If X and Y are independent, then 
it follows from the definition of covariance that Cov(X, Y) = 0. The correlation coefficient of X and 
Y, isa dimensionless quantity that measures the linear relationship between the random variables X 
and Y. 


PROPERTIES OF COVARIANCE AND CORRELATION COEFFICIENT 
(a) -l<p<l. 
(b) If X and Y are independent, then p = 0. The converse is not true. 
(c) IfY =aX +5, then 


Cov(xX; Y) = 


Note that Cov(X,X) = Var(X). 
(d) If U = a)X + by and V =apY + bp, then 


(i) Cov(U, V) = a,a2Cov(X,Y), 


and 


e pxy, ifajaz>0 
(ii) puv = 


—pxy, otherwise. 


(e) Var(aX + bY) = a2Var(X) + b2Var(Y) + 2abCov(X, Y). 


——— Orr 


Example 3.3.6 
The joint probability density of the random variables X and Y is given by 


1 
—eJ/8 O<x<y<@w 
f@y= 64 


0, otherwise. 
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Find the covariance of X and Y. 


Solution 
We can use the formula, Cov(X, Y) = E(XY) — E(X)E(Y). Now using integration by parts (three times) we 


will get 


oo y 1 
E(XY) = / / Gy) Ge Pdxdy 
0 0 


ee) y 
1 
= ye 9/8 (/ ws] dy 


0 0 
Co 
1 a 
= — ¥/8 dy = 192. 
128 "? - 
0 
We can also obtain 
oo y , 
E(X) = / / rae Sdxdy = 8 
0 0 
and 
oo y 
1 
E(Y) = Jf qeranay = 16. 


0 0 


Thus, Cov(X, Y) = 192 — (8)(16) = 64. 


Next we will define the moment-generating function for the bivariate distributions. 


Definition 3.3.8 Let X and Y be jointly distributed. Then the joint moment-generating function is 
defined by 


Mx.) (t1,t2) — E (gare) 


YY ett +29 f(x, y), if X and Y are discrete 
y x 


wo wo 
ff et**Py F(x, y)dxdy, if X and Y are continuous. 


—0O —0O0 


EXERCISES 3.3 
3.3.1. Anexperiment consists of drawing four objects from a container, which holds eight operable, 
six defective, and 10 semioperable objects. Let X be the number of operable objects drawn 
and Y the number of defective objects drawn. 
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(a) Find the joint probability function of the bivariate random variable (X, Y). 
(b) Find P(X = 3, Y =0). 

(c) Find P(X <3, Y = 1). 

(d) Give a graphical presentation of (a), (b), and (c). 


3.3.2. Let 


i 
te 5p +29), x=0,1,2,3andy=x+4+3, 
xX,yW= 


0, otherwise. 


Show that f(x, y) satisfies the conditions of a probability density function. 
3.3.3. Let 


f@y=cQ-xd-y), -1s<x<1, -Ils<y<K<1l. 


Find the c that makes f(x, y) the joint probability density function of the random variable 
(X,Y). 


3.3.4. Let 
f@,y)=xe*, x>0, y=. 


Is f(x, y) a probability density function? If not, find the proper constant to multiply with 
Jf (, y) so that it will be a probability density. 


3.3.5. Find the marginal probability density function of the random variables X and Y, if their 
joint probability density function is given in Table 3.3.1. 


Table 3.3.1 


3.3.6. Find the marginal density functions of the random variables X and ¥ if their joint probability 
density function is given by 


s(3x—y), 1<x<2,1<y<3, 
FQ, y) = 


0, otherwise. 
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3.3.9. 


3.3.10. 


3.3.11. 
3.3.12. 


3.3.13. 


3.3.14. 


Determine the conditional probability P(X = —1|Y = 0) for the random variables defined 
in Problem 3.3.5. 


Find k so that f(x, y) = kxy, 1 < x < y < 2 will bea probability density function. Also find 
(i) P(X < 3, Y < 3), and (ii) P(X +Y < 3). 


The random variables X and Y have a joint density 


8 xy, l<x<y<2, 


joss=| 


0, elsewhere. 


Find: 
(a) The marginal of X. 
(b) P(1.5 < X < 1.75, Y > 1). 


The joint pdf of X and Y is 


mg (4x+2y+1), O<x<2,0<y<2 
fQ y= 


0, elsewhere. 

Find (a) fx(«) and fy(y), and (b) f(y |x). 

Find the joint mgf of the random variables (X, Y) defined in Problem 3.3.9. 
The joint density of a random variable (X, Y) is given by 


34,3 


ae OSx52,05y<2 
0, 


SQ, »=| 


elsewhere. 


(a) Find marginals of X and Y, and (b) find f(y |x). 


The joint probability function of a discrete random variable (X, Y) is given by 


Oxy = a 
fle.) = | LawerbOmey | P= Te 
0, otherwise. 


Find (a) f(x|y), and (b) f(y|x). 
[Hint: )7"_, i? = (n(n + 1)(2n + 1))/6.] 


Consider bivariate random variables with the density 


f(x,y) = ( yeah yy FR! fore =O: 1h c.caat 
x 


and0O<y<1. 
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Verify that 
f(xly) « ( yad=yy* 


and 
fol) xy? 1a — yy FL 


3.3.15. The joint density function of the discrete random variable (X, Y) is given in Table 3.3.2. 


Table 3.3.2 


y 
x 1 2 3 
ead 
26 2 TG 
3 + 4 0 


(a) Find E(XY). 
(b) Find Cov(X, Y). 
(c) Find the correlation coefficient px y. 


3.3.16. The joint probability function of the continuous random variable (X, Y) is given by 


ag (4x+2yt1), O<x<2, O<y<2, 
FQ, y) = 
0, otherwise. 
(a) Find E(XY). 
(b) Find Cov(X, Y). 
(c) Find the correlation coefficient pyy. 


3.3.17. Let X and Y be random variables and U = aX +b, V = cY +d, where a, b, c, d are constants. 


ifac>0O 
i; _ Jj pxy, 1 ; 
Show that pyy —pxy, otherwise. 


3.3.18. Let X and Y be two independent random variables, and let Y = aX + b, where a and b are 
constants. Show that (a) pxy = 1 ifa > 0, and (b) pyy = -1ifa <0. 


3.3.19. If |oxy| = 1, then prove that P(Y = aX +b) = 1. 
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3.3.20. Let X and Y be two random variables with joint density function 
8xy, O<x<y<l 
fay) = 


0, otherwise. 


(a) Find the conditional expectation, E(X|Y = 3). 
(b) Find Cov(X, Y). 


3.3.21. Let X and Y be two random variables with joint density function 


ey, O<x<y 
f(x, y) = ; 
0, otherwise. 


(a) Find the conditional expectation, E(X|Y = y). 
(b) Find Cov(X, Y). 
(c) Are X and Y independent? Why? 


3.3.22. Let 
fy) = 


Cc 
, co<x<oo, -l<y<l. 
(1+ x2)/1— y? 
Find the c that makes f(x, y) the probability density function of the random variable (X, Y). 
Determine whether X and Y are independent. 


3.3.23. If the random variables X and Y are independent and have equal variances, what is the 
coefficient of correlation between the random variables X and aX + Y, where a is a constant? 


3.4 FUNCTIONS OF RANDOM VARIABLES 


In this section we discuss the methods of finding the probability distribution of a function of a 
random variable X. We are given the distribution of X, and we are required to find the distribution of 
g(X). There are many physical problems that call for the derivation of the distribution of a function 
of a random variable. The following is one of the classical examples. The velocity V of a gas molecule 
(Maxwell-Boltzmann law) behaves as a gamma-distributed random variable. We would like to derive 
the distribution of E=mV?, the kinetic energy of the gas molecule. Because the value of the velocity is 
the outcome of a random experiment, so is the value of E. This is a problem of finding the distribution 
of a function of a random variable E = g(V). We now illustrate various techniques for finding the 
distribution of g(X) by means of examples. 


3.4.1 Method of Distribution Functions 


Basically the method of distribution functions is as follows. If X is a random variable with pdf fy(x) 
and if Y is some function of X, then we can find the cdf Fy(y) = P(Y < y) directly by integrating 
fx (x) over the region for which {Y < y}. Now, by differentiating Fy (y), we get the probability density 
function fy(y) of Y. In general, if Y is a function of random variables X1,..., Xn, say g(X1,..., Xn), 
then we can summarize the method of distribution function as follows. 
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PROCEDURE TO FIND CDF OF A FUNCTION OF R.V. USING THE METHOD OF DISTRIBUTION 


FUNCTIONS 
1. Find the region {Y < y}in the (x1,x2, ...,Xn) space, that is find the set of (x7,x2, ...,Xn) for which 
g(x1, ....Xn) Sy. 


2. Find Fy(y) = P(Y < y) by integrating f(x1,x2, ...,Xn) over the region {Y < y}. 
3. Find the density function fy (y) by differentiating Fy (y). 


-—_—$$ $A A $\>NAN A SS ———_ i—o—ii—_ ——_ _—_—. 


Example 3.4.1 
Let X ~ N(O, 1). Using the cdf of X, find the pdf of X2. 


Solution 
Let Y = X2. Note that the pdf of X is 


—x?/2 


1 
IO) =a? , 


Then the cumulative distribution function of Y for a given y > 0 is 


-—-O <X< OW. 


F(y) = PY < y) = P(X? <y) 
= P(-Vy < X< Vy) 


VY 
- / : en 2 ay 


=2 / e/g, (by the symmetry of eo 8/2), 
0 


Hence, by differentiating Fy), we obtain the probability density function as 


frQ) = 24 y/2_ 1 


J 20 2/y 
Fay Me ?, 0<y<ow 
0, otherwise. 


This is a x2-distribution with 1 degree of freedom. 


The same method can be used for the discrete case. 
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Example 3.4.2 
Suppose that the random variable X has a Poisson probability distribution 


on {= H= 0. 1, Bec az 
x)= 


0, otherwise. 


Find the cumulative distribution function of Y = aX + b. 


Solution 
The cdf of Y is given by 


F(y)= PY <y)= Pax+b<y) 


y—b Pe bias 
= P(x < a > x!” 


where [x] is the largest integer less than or equal to x. Therefore, 


0, y<b 


It should be noted here that the pmf, fy(y) of Y, can be found from the equation 


fyQ) = FyQ) -— Fy -1), for y=an+b, n=0,1,2,... 
= 


The multivariate case (in particular, the bivariate case), though more difficult, can be handled similarly. 


3.4.2 The pdf of Y = g(X), Where g Is Differentiable and Monotone 
Increasing or Decreasing 


We now consider the distribution of a random variable Y = g(X), where X is a continuous random 
variable with pdf fx(x). Assume that g is differentiable and the inverse function g~! of g exists. Let 
X = g!(¥). Let fx(x) be the probability density function of X. Then the density function of Y can 
be obtained using the method just given. Thus, 


d 
fr (y) = fx(g7'@))= Ze 
y 


This is a special case of the transformation method, which is explained later in Subsection 3.4.4. 
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Example 3.4.3 
Let X ~ N(O, 1). Find the pdf of Y = e*. 


Solution 
Here g(x) = e*, and hence, g—!(y) = In(y). Thus, Ke) = = 
Also, 
Sx) = are -OO <xX< OM. 
V20 ; 
Therefore, the pdf of Y is 
_1_.-[Ingy?/2 0 
e , y> YU, 
fy(y) = 4 V2" 
0, otherwise. 


3.4.3 Probability Integral Transformation 
Let X be a continuous random variable, with pdf f and cdf F. Let Y = F(X). Then, 


P(Y < y) = P(F(X) < y) = P(X < F“'09)) 


FO) ; 
Fr '(y) 
= fx(@dx = Fx (x) =). 
—0oo 
—oo 


Hence, 


1, O<y<l 
fO) = 
0, otherwise. 
Thus, Y has a U(0,1) distribution. The transformation Y= F(X) is called a probability integral 
transformation. It is interesting to note that irrespective of the pdf of X, Y is always uniform 
in (0, 1). 


$$ 


Example 3.4.4 
Let X be anormal with mean ju and variance o2. Thus, 


1 v] 
fwe= a ,-00 <x <00,-00 <p <, ando” > 0. 
to 
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1 


210 
variable. Therefore Y is uniform on (0, 1). That is, 


1, ifO0<y<1 
f(y) = 


0, otherwise. 


Let Y = ti e— (&—1)/20° dy. Then Y = F(X), where F is the cdf of a standard normal random 


3.4.4 Functions of Several Random Variables: Method of Distribution 
Functions 


We now discuss the distribution of Y, when Y is a function of several random variables, Y = 
2(X1, eveneng Xn). 


uz 


Example 3.4.5 
Let X1,..., Xn be continuous iid random variables with pdf f(x) (cdf F(x)). Find the pdfs of 


Y; =min(X,...,Xn) and Yy, = max(Xj,..., Xn). 
Solution 
For the random variable Y;, we have 
1— Fy, (y) = P(Y > y) 
= P(X, > y,X2>y,...,Xn > y) 
= P(X > y)P(X2 > y)... P(Xn > y) 
(because of independence) 
=(1—FQ))”. 
This implies 
Fy) =1-0=FQ))" 


and 


fr, 0) = 20. — FQ))""" FO). 
Consider Yy. Its cdf is given by 


Fy, (y) = P(¥n S y) = (FQ))”. 


This implies that 


fr, (9) = n(F(y))"—! f(y). 
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3.4.5 Transformation Method 


Asimple generalization of the method of distribution functions to functions of more than one variable 
is the transformation method. We illustrate the method for bivariate distributions. The method is similar 
for the multivariate case. Let the joint pdf of (X, Y) be f(x, y). Let U=g1(X, Y); V=g2(X, Y). The 
mapping from (X, Y) to (U, V) is assumed to be one-to-one and onto. Hence, there are functions, 1 
and hz such that 


x= hy, v), 
and 
y= hy lu, v). 


Define the Jacobian of the transformation J by 


Ox Ox 
J= du av 
dy oy 
ou ou 


Then the joint pdf of U and V is given by 


flu, v) = f(Ay (u,v), hy! u,v) [JI 


EOE EOE 
Example 3.4.6 
Let X and Y be independent random variables with common pdf f(x) = e*, (x > 0). Find the joint pdf of 
U=X/(X4+Y)VHX+Y. 


Solution 
We have U=X/(X + Y)=X/V. Hence, X=UV and Y=V—X=V—UV=V(1-—U). Thus, the Jacobian 


Then |J| = v0. — u) +uv = v(> 0). Note thatO <u<1,0<v<%&%. 
Flu, v) = f (Ayu, »), Ay 1, v)) [JI 


— ete —u) 7 


=ve’, O<u<1,0<v<0©. 


’ 
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Suppose we want the marginal fy(v) and fy(v), that is, 
1 
fv) = / ve "du=ve", O<uv<0co 
0 


and 
CO 
fu) = / ve "dv=1, O<u<1. 
0 
Sometimes the expressions for two variables, U and V, may not be given. Only one expression is 
available. In that case, call the given expression of X and Y as U, and define V = Y. Then, we can use 


the previous method to first find the joint density and then find the marginal to obtain the pdf of U. 
The following example demonstrates the method. 


EEO EEO 
Example 3.4.7 
Let X and Y be independent random variables uniformly distributed on [0, 1]. Find the distribution of X+Y. 


Solution 
Let 


U=X4Y, 
V=y, 
f@my=1, O<x<1,0<y<1l, 


X=U-YV, 
Y=V, 
1 -1l 
J= =1. 
0 1 
Thus, we have 
1, O<u-v<1, O<v<1l, 
u,v) = : 
fv) fi otherwise. 


Because V is the variable we introduced, to get the pdf of U, we just need to find the marginal pdf from the 
joint pdf. From Figure 3.10, the regions of integration are 0 <u < 1, and0O <u < 2. That is, 


fo = f fu.wdo= f rde 
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(1, 0) (2, 0) 


Wi FIGURE3.10 The regions of integration. 


fy(#) 7 


WM FIGURE3.11 Graph of fy(w). 


Figure 3.11 shows the graph of fy(u). 


EXERCISES 3.4 
3.4.1. Let X be a uniformly distributed random variable over (0, a). Find the pdf of Y=cX +d. 
3.4.2. The joint pdf of (X, Y) is 


1 _xty 
f@my= se 7, x,y>0, O>0. 
62 


Find the pdf of U= X —Y. 
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3.4.3. 


3.4.4. 


3.4.5. 


3.4.6. 


3.4.7. 


3.4.8. 


3.4.9. 


3.4.10. 
3.4.11. 


Let f(x, y) be the probability density function of the continuous random variable (X, Y). If 
U=XY, show that the probability density function of U is given by 


(oe) 


fow = f r(*.») 


—oo 


dv. 


The joint pdf of X and Y is 

f(x, y) = be" FH") 9 5 0, x > 0. 
Find the pdf of XY. 
If the joint pdf of (X, Y) is 


1 x+y? 
fy) = e into | ) —0 <x<0o, 
210102 
—-WO<y<0;01,02 >0 
find the pdf of X? + Y?. 
Let X1,..., Xn be independent and identically distributed random variables with pdf f(x) = 


(1/0)e*/°, x > 0,6 > 0. Find the pdf of )77_, Xi. 


Let f(x, y) be the pdf of the continuous random variable (X, Y). If U = X + Y, then show 
that the probability density function of U is given by 


CO 
Aw= / FG aa 
—0o 


Let X be uniformly distributed over (—2, 2) and Y = X*. Find the Cov(X, Y). Are X and Y 
independent? 


Let X ~ N(, 02). Show that 

(a) Z= “— is NCO, 1). 

(b) U = 2" is 2(1). 

Let X ~ N(w, 07). Find the pdf of Y = e*. 


The probability density of the velocity, V, of a gas molecule, according to the Maxwell- 
Boltzmann law, is given by 


f(v, B) = 


0, elsewhere 


where c is an appropriate constant and £ depends on the mass of the molecule and the 
absolute temperature. Find the density function of the kinetic energy E, which is given by 
E = g(V) = 5mV?. 
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3.4.12. Let X and Y be two independent random variables, each normally distributed, with param- 
eters (11, a); and (2, 03), respectively. Show that the probability density function of 
U = X/Y is given by 


0102 
ful) = 7 ey! 00 <u < 0X. 
m (az +05") 
3.4.13. Let 
f(x,y) = = e(1/207)(x"+y"), 90 < x,y < 00 
2no02 


be the joint pdf of (X, Y). Let 
Y 
U= V/XxX2+4+Y¥2 and V=tan7! (5): O<V<2z. 


Find the joint pdf of (U, V). 
3.4.14. Let the joint pdf of (X, Y) be given by 
Bo2etGty)/Bh x,y>0,B>0, 


0, elsewhere. 


IQ, =| 


Let U = 


and V = Y. Find the joint pdf of (U, V). 


3.4.15. Let X and Y be independent and identically distributed random variables with pdf 


r x>0, 


0, otherwise. 


Find the distribution of (X — Y)/2. 


3.4.16. If X and Y are independent and chi-square distributed random variables with n, and n2 
degrees of freedom, respectively. Obtain the joint distribution of (U, V), where U = X + Y 
and V= X/Y. 


3.5 LIMIT THEOREMS 


Limit theorems play a very important role in the study of probability theory and in its applications. In 
Chapter 2, we saw that the frequency interpretation of probability depends on the long-run proportion 
of times the outcome (event) would occur in repeated experiments. Also, in Section 3.2, we learned 
that some binomial probabilities can be computed using either the Poisson probability distribution 
or the normal probability distribution using the limiting arguments. Many random variables that we 
encounter in nature have distributions close to the normal probability distribution. These modeling 
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simplifications are possible because of various limit theorems. In this section, we discuss the law of 
large numbers and the Central Limit Theorem. 


First we give Chebyshev’s theorem, which is a useful result for proving limit theorems. It gives a lower 
bound for the area under a curve between two points that are on opposite sides of the mean and 
are equidistant from the mean. The strength of this result lies in the fact that we need not know the 
distribution of the underlying population, other than its mean and variance. This result was developed 
by the Russian mathematician Pafnuty Chebyshev (1821-1894). 


CHEBYSHEV’S THEOREM 


Theorem 3.5.1 Let the random variable X have a mean yx and standard deviation o. Then for K > 0, a 
constant, 


i 
CN ae Se) =| ary 


Proof. We will work with the continuous case. By definition of the variance of X, 


CO 
oF = (Xu)? = fox Ww)? Fa 
—0o 


u—Ko u+Ko oo 
= f w-wpe@a+ fp o-weped+ fom? fora 
—0o pu—Ko U+Ko 
u—Ko ioe) 
> (x — 1)? f (x)dx + / (x — pw)? f(a)dx. 
—0o u+Ko 
Note that (x — yx)? > K?o? for x <  — Ko or x > «+ Ko. The equation above can be rewritten as 
u-—Ko oe) 
o? > K*o? , f (x)dx + / f (x)dx 


u+Ko 
= K*o* [P{X < uw — Ko} + P{X > w+ Ko}] 
= K*o* P{|X — | => Ko}. 


This implies that 


P{|X — | >= Ko} < rei 
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or 


P(X yl < Ko) = 1-25. 


We can also write Chebyshev’s theorem as 


E[(X — »)?] _ Var(X) 


P{|X —pl>e}< 7 x forsomee > 0. 
€ € 


Equivalently, 
1 
P{|X — p| = Ko} < rok 


In other words, Chebyshev’s inequality states that the probability that a random variable X differs 
from its mean by at least K standard deviations is less than or equal to 1/K?(K > 2). 


In statistics, if we do not have any idea of the population distribution, Chebyshev’s theorem is 
used in the following manner. For any data set (regardless of the shape of the distribution), at least 
(1 —(1/k*))100% of observations will lie within k(> 1) standard deviations of the mean. For example, 
at least (1—(1/27)) 100% = 75% of the data will fall in the interval (x— 2s, ¥+2s) and at least 88.9% of 
the observations will lie within three standard deviations of the mean. If the population distribution 
is bell shaped, we have a better result than Chebyshev’s theorem, namely, the empirical rule that 
states the following: (i) approximately 68% of the observations lie within one standard deviation 
of the mean; (ii) approximately 95% of the observations lie within two standard deviations of the 
mean; and (iii) approximately 99.7% of the observations lie within three standard deviations of the 
mean. 


ee... ee _—_GGQ$@e— —wo_=—x@w 
Example 3.5.1 
A random variable X has mean 24 and variance 9. Obtain a bound on the probability that the random 
variable X assumes values between 16.5 to 31.5. 


Solution 
From Chebyshev’s theorem. 


1 
Pb ROSA Se Raye 1a 


Equating 4 + Ko to 31.5 and w — Ko to 16.5 with w = 24 and o = /9 = 3, we obtain K = 2.5. 
Hence, 


1 
P{16.5 < X <31.5}>1- = 0.84 
(2.5)2 
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Example 3.5.2 
Let X be a random variable that represents the systolic blood pressure of the population of 18- to 
74-year-old men in the United States. Suppose that X has mean 129 mmHg and standard deviation 
19.8 mm Hg. 
(a) Obtain a bound on the probability that the systolic blood pressure of this population will assume 
values between 89.4 and 168.6 mm Hg. 
(b) In addition, assume that the distribution of X is approximately normal. Using the normal table, find 
P(89.4 < X < 168.6). Compare this with the empirical rule. 


Solution 


(a) Because we are given only the mean and standard deviation, and no distribution is specified, we use 
Chebyshev’s theorem. We have 


1 
PYRE SS tht May eh a 


Equating w+ Ko to 168.6 and uw — Ka to 89.4 with uw = 129 ando = 19.8, we obtain K = 2. Hence, 


1 
P {89.4 < X < 168.6} > 1—- a i 0.75. 


(2) 
(b) Because X is normally distributed with mean 129 and standard deviation 19.8, using the z-score, we 
get 
89.4 — 129 168.6 — 129 
P(89.4 < X < 168.6) = P| ————— < Z < ——____ 
19.8 19.8 


= P(-2 < Z < 2) = 0.9544. 
Hence, approximately 95.44% of this population will have systolic blood pressure values between 89.4 
and 168.6 mm Hg. This compares well with the 95% value from the empirical rule. 


We could use Chebyshev’s inequality to prove the following result, which is called the weak law of 
large numbers. The law of large numbers states that if the sample size n is large, the sample mean 
rarely deviates from the mean of the distribution of X, which in statistics is called the population 
mean. 


LAW OF LARGE NUMBERS 


Theorem 3.5.2 Let X1,..., Xn be a set of pairwise independent random variables with E(X;) = «4, and 
var(X;) = 0%. Then for any c > 0, 
Oo 


Be eS GS) = 
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and as n — ov, the probability approaches 1. Equivalently, 


S; 
P( ae H < °) >1 
n 
asn — oo. 
Proof. Because X1,..., Xn are iid random variables, we know that Var(S,) = no?, and Var(S,/n) = 


o7/n. Also, E(S,/n) = tc. By Chebyshev’s theorem, for any ¢ > 0, 


(R-|2)=% 
P\|— -p\| 26 Sz: 
n né 


Thus, for any fixed e, 


as n — oo. Equivalently, 


asSn > ©. 


Thus, without any knowledge of the probability distribution function of S,, the (weak) law of large 
numbers states that the sample mean, X = S,,/n, will differ from the population mean by less than 
an arbitrary constant, e > 0, with probability that tends to 1 as n tends to oo. Because of this, the law 
of large numbers is also called the “law of averages.” This result basically states that we can start with 
a random experiment whose outcome cannot be predicted with certainty, and by taking averages, 
we can obtain an experiment in which the outcome can be predicted with a high degree of accuracy. 
The law of large numbers in its simplest form for the Bernoulli random variables was introduced 
by Jacob Bernoulli toward the end of the 16th century. This result in generality was first proved by 
the Russian mathematician A. Khintchine in 1929. This result is widely used in its applications to 
insurance, statistics, and the study of heredity. 


3A A 
Example 3.5.3 


Let X1,..., X, be iid Bernoulli random variables with parameter p. Verify the law of large numbers. 
Solution 
For Bernoulli random variables we know that EX; = p, and Var(X;)= p(1 — p). Thus, by Chebyshev’s 
theorem, 
= S 2 
P(p-cs¥<ptdq=P| =p <el>1-25 
n nc 
_ 
Age et aS n — Oo. 
nc 


This verifies the weak law of large numbers. 
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Example 3.5.4 
Consider n rolls of a balanced die. Let X; be the outcome of the ith roll, and let S, = eee Show that, 


for any ¢ > 0, 
o( Sn — < > :) —>0 
n 2 
asin — oo. 
Solution 
Because the die is balanced, EX; = 7/2. By the law of large numbers, for any ¢ > 0, 
Sn Tr 


as n — od, or equivalently, 


as n — Oo. 
| 


One of the most important results in probability theory is the Central Limit Theorem. This basically 
states that the z-transform of the sample mean is asymptotically standard normal. The amazing thing 
about the Central Limit Theorem is that no matter what the shape of the original distribution is, 
the (sampling) distribution of the mean approaches a normal probability distribution. We state one 
version of the Central Limit Theorem. In a restricted case, the proof uses the idea that the moment- 
generating functions of Z, converge to the moment-generating function of the standard normal 
random variable. The general proof is a little bit more involved. Because the proof of the Central 
Limit Theorem is available in most probability books, we will not give the proof here. 


CENTRAL LIMIT THEOREM (CLT) 


Theorem 3.5.3 If X,,..., Xn is a random sample from an infinite population with mean 1, variance o?, 


and the moment-generating function Mx (t), then the limiting distribution of Zn = (X — 1)/(o/./n) as 
n — oo is the standard normal probability distribution. That is, 
1 &j 
P ns =D 
im, P(Z, <Z)= ae / e dt. 


—co 


If S, = )°7_, Xi, then we can rewrite Z,, as 
_X-n_n(X-4) 
~ of/J/n nofJfn ” 


Zn 


Sn — np : > = 
= ——., sincenX = Xj. 
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Then the CLT states that Z, = (S, — nj) /o./n is approximately N(0, 1) for large n. 


The Central Limit Theorem basically says that when we repeat an experiment a large number of times, 
the average (almost always) follows a Gaussian distribution. 


Ee—__---_—.,TTKTrITOTcOooo— = =_E—eeeee 
Example 3.5.5 
Xj , X2,...areiid random variables such that 
1, with probability p, 


X; = 
‘ 0, with probability 1 — p. 


Show that Zn = (Sn — np)/./mpq is approximately normal for largen, where S, = -"_, X;,andq = 1— p. 


Solution 
We know that 


E(X) = p; E(X?) = p; Var(X) = p— p? = pa. 
Hence, by the CLT, the limiting distribution of Zy =(Sn — np)/./npq as n — oo is the standard normal 


probability distribution. 
= 


e_—_-:: Kgp(}¢*_—_———_— 
Example 3.5.6 
A soft-drink vending machine is set so that the amount of drink dispensed is a random variable with a mean 
of 8 ounces and a standard deviation of 0.4 ounces. What is the approximate probability that the average 
of 36 randomly chosen fills exceed 8.1 ounces? 


Solution 
From the CLT, ((X — 8)/(0.4/./36)) ~ N(0, 1). Hence, from the normal table, 


8.1 — 8.0 
OF 


/36 
= p{Z > 1.5} = 0.0668. 


P(X>sij=P{Z> 


OOOO -:::.:.— nn — — eeS—_c_c_ eee 
Example 3.5.7 
Numbers in decimal form are often approximated by the closest integers. Suppose n numbers X1,..., Xn 
are approximated by their closest integers Jj, J2,..., Jn. Let Uj = X; — Jj. Assume that U; are uniform on 
(—0.5, 0.5) and that Us are independent. 


n 

i=1 i 
—_S—— ~ N(O, 1) asn > cw. 
Jn/12 

—5 7 1 U; 25 5 
/300/12 — /300/12 ~ 300/12 J" 


(a) Show that 


(b) Find P| 
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(c) Find the value of a such that P{—a < )0 Uj < a} = 0.95 


(d) Forn = 10°, find a such that p{-a < yes U; < a| = 0.99. 


Solution 
(a) Because U!s are uniform in (—0.5,0,5),>>U; = 0, Var(Uj) = 1/12. Let, Sy = SVL, Xi, and 
Kn = vt ji. Then 


P(lSn— Knl Sa} = P{-a< )(xi- Ji) sal 
=P{-a< )vui <a}. 


uu; -0 
BEE 0,1) dew 3 6 


Jn/i2 


(b) For n = 300;a = 5. Using the normal table, 


—5 hay Uj; 5 
P < u < = 0.68. 
300/12 ~ ./300/12 ~ ./300/12 


By the CLT, 


(c) Now, 


0.95=P{—-a<)\U;<al 


= P| anes <Z< ian . 
300/12 300/12 


From the normal table, we get en 1.96. This implies, a= 9.8. 
/300/12 
(d) We have 
10° 
0.99=P)-a<) Uj <a 
i=1 
—a a 
= P) ——— <Z < —___}}. 
106/12 106/12 
Now, using the normal table, we have a/,/10°/12 = 2.58. Hence, a = 745. = 


——“—_,:§. Quan nama eR Re e—_<0<3ooe 
Example 3.5.8 
Acasino has a coin, suspected to be biased. Estimate p (probability of heads) such that they can be confident 
that their estimate (say, p) is within 0.01 of p (unknown). What is the minimum number of times we need 
to toss this coin? 
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Solution 
Set 


¢ 1, if Has j’th toss, 
7 0, if T as j’th toss. 


Suppose we decided to use p = XX, that is, ( Hees), 
We want P{|X — p| < 0.01} = 0.99. 


Because Y = )-_, X; ~ Bin(n, p), we have EY = np, Var(Y) = npg. By the CLT, (X — p)//pq/n ~ 
N(O, 1). Now, 


J pq/n * J pq/n - J pq/n 


= {00 i | 
| Vpq/n Jpaq/n J ° 


—0.01 X- 0.01 
0.99 =P P 


Using the normal table, (0.01/./pq/n) = 2.58, this implies that ./n > (2.58,/pq/0.01). 
Because the maximum of pq = 1/4, it is sufficient that 


wee (2.58)(V(./4)) 


= 1295. 
0.01 


Hence, n = (129) = 16,641, and we should choose the sample size n > 16,641. 
| 


The Central Limit Theorem is extremely important in statistics because it says that we can approx- 
imate the distribution of certain statistics without much of the knowledge about the underlying 
distribution of that statistics for a relatively “large” sample size. How large the n should be for 
this normal approximation to work depends on the distribution of the original distribution. 
A rule of thumb is that the sample size n must be at least 30. We deal with these issues in 
Chapter 4. 


EXERCISES 3.5 


3.5.1. Let X be a random variable with probability density function 


630x*(1—x)4, O<x <1, 
f(x= 


0, otherwise. 


(a) Obtain the lower bound given by Chebyshev’s inequality for P{0.2 < X < 0.8}. 
(b) Compute the exact probability, P{0.2 < X < 0.8}. 
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3.5.2. Suppose that the number of cars arriving in 1 hour at a busy intersection is a Poisson 
probability distribution with 4 = 100. Find, using Chebyshev’s inequality, a lower bound 
for the probability that the number of cars arriving at the intersection in 1 hour is between 
70 and 130. 


3.5.3. Prove Chebyshev’s inequality for the discrete case. 


3.5.4. Suppose that the number of cars arriving at a busy intersection in a large city has a Poisson 
distribution with mean 120. Determine a lower bound for the probability that the number 
of cars arriving in a given 20-minute period will be between 100 and 140 using Chebyshev’s 
inequality. 


3.5.5. Find the smallest value of n in a binomial distribution for which we can assert that 
Xn 
P (|= - | < 0.1) > 0.90. 
n 
3.5.6. How large should the size of a random sample be so that we can be 90% certain that the 


sample mean X will not deviate from the true mean by more than o/2? 


3.5.7. Leta fair coin be tossed n times and let S, be the number of heads that turn up. Show that 
the fraction of heads, S,,/n, will be near to 1/2 for large n. What can we conclude if the coin 
is not fair? 


3.5.8. Suppose that a failure of certain component follows the distribution f(x) = p*(1 — p)* 
for x = 0, 1, and zero, elsewhere. How many components must one test in order that the 
sample mean X will lie within 0.4 of the true state of nature with probability at least as great 
as 0.952 


3.5.9. Let X),..., X, be a sequence of mutually independent random variables, with probability 
distribution 


1 1 
P(X; = Vi) = 5 and P(X; = —Vi) = 3 
Show that this sequence of random variables does not satisfy the conditions of the law of 
large numbers. 
3.5.10. Give a proof of the Central Limit Theorem. 


3.5.11. Let X),..., X, be independent discrete random variables identically distributed as 


0.2, xj =0, 1,2, 3,4, 
f@i) = 


0, otherwise. 


Using CTL, find the approximate value of P(X199 > 2), where X199 = (1/100) Beary Xj. 


3.5.12. Let X1,..., X, be a sequence of independent Poisson-distributed random variables, with 
parameter A. Let S,, = )°7_, X;. Show that Z, = ((S, — na)/V/na) ~ N(O, 1). 
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3.5.13. Let X1,..., Xn be a sequence of independent uniformly-distributed over [0,1) random 
variables. Let S, = )-;_,X;. Show that Z, = ((S, —nd)//na) ~ N(O, 1). 


3.5.14. Suppose that 2500 customers subscribe to a telephone exchange. There are 80 trunk lines 
available. Any one customer has the probability of 0.03 of needing a trunk line on a given 
call. Consider the situation as 2500 trials with probability of “success” p= 0.03. What is 
the approximate probability that the 2500 customers will “tie up” the 80 trunk lines at any 
given time? 


3.5.15. Suppose a group of people have an average IQ of 122 with standard deviation 2. Obtain a 
bound on the probability that IQ values of this group will be between 104 and 120. 


3.5.16. Let X be a random variable that represents the diastolic blood pressure (DBP) of the 
population of 18- to 74-year-old men in the United States who are not taking any corrective 
medication. Suppose that X has mean 80.7 mm Hg and standard deviation 9.2. 

(a) Obtain a bound on the probability that the DBP of this population will assumes values 
between 53.1 and 108.3 mm Hg. 

(b) In addition, assume that the distribution of X is approximately normal. Using the 
normal table, find P(53.1 < X < 108.3). Compare this with the empirical rule. 


3.5.17. Color blindness appears in 2% of the people in a certain population. How large must a 
random sample be in order to be 99% certain that a color-blind person is included in the 
sample? 


3.5.18. A shirt manufacturer knows that, on the average, 2% of his product will not meet quality 
specifications. Find the greatest number of shirts constituting a lot that will have, with 
probability 0.95, fewer than five defectives. 


3.5.19. A random sample of size 100 is taken from a population with mean 1 and variance 0.04. 
Find the probability that the sample mean is between 0.99 and 1. 


3.5.20. The lifetime X (in hours) of a certain electrical component has the pdf f(x) = 
(1/3)e—/)*, x > 0. If a random sample of 36 is taken from these components, find 
P(X <2). 


3.5.21. A drug manufacturer receives a shipment of 10,000 calibrated “eyedroppers” for administer- 
ing the Sabin poliovirus vaccine. If the calibration mark is missing on 500 droppers, which 
are scattered randomly throughout the shipment, what is the probability that, at most, two 
defective droppers will be detected in a random sample of 125? 


3.6 CHAPTER SUMMARY 


In this chapter we looked at some special distribution functions that arise in practice. It should 
be noted that we discussed only a few of the important probability distributions. There many 
other discrete and continuous distributions that will be useful and appropriate in particular appli- 
cations. Some of them are given in Appendix A3. A larger list of probability distributions can 
be found at http://www.causascientia.org/math_stat/Dists/Compendium.pdf, among many other 
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places. For more than one random variable, we learned the joint distributions. We also saw how 
to find the density and cumulative distribution for the functions of a random variable. Limit theo- 
rems are a crucial part of probability theory. We have introduced the Chebyshev’s inequality, the law 
of large numbers, and the Central Limit Theorem for the random variables. 


We now list some of the key definitions introduced in this chapter: 


Bernoulli probability distribution 

Binomial experiment 

Poisson probability distribution 

Probability distribution 

Normal (or Gaussian) probability distribution 
Standard normal random variable 

Gamma probability distribution 
Exponential probability distribution 
Chi-square (x7) distribution 

Joint probability density function 

Bivariate probability distributions 

Marginal pdf 

Conditional probability distribution 
Independence of two r.v.s 

Expected value of a function of bivariate r.v.s 
Conditional expectation 

Covariance 

Correlation coefficient 


In this chapter, we have also learned the following important concepts and procedures: 


Mean, variance, and moment-generating function (mgf) of a binomial random variable 
Mean, variance, and mef of a Poisson random variable 

Poisson approximation to the binomial probability distribution 

Mean, variance, and megf of a uniform random variable 

Mean, variance, and mgf of a normal random variable 

Mean, variance, and mgf of a gamma random variable 

Mean, variance, and mgf of an exponential random variable 

Mean, variance, and mef of a chi-square random variable 

Properties of expected value 

Properties of the covariance and correlation coefficient 

Procedure to find the cdf of a function of r.v. using the method of distribution functions 
The pdf of Y = g(X), where g is differentiable and monotone increasing or decreasing 
The pdf of Y = g(X), using the probability integral transformation 

The transformation method to find the pdf of Y = g(X1,..., Xn) 

Chebyshev’s theorem 

Law of large numbers 

Central Limit Theorem (CLT) 
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3.7 COMPUTER EXAMPLES (OPTIONAL) 
3.7.1 Minitab Examples 


Minitab contains subroutines that can do pdf and cdf computations. For example, for binomial 
random variables, the pdf and cdf can be respectively computed using the following comments. 


MTB > pdf k; 
SUBC > binomial n p. 


and 


MTB > cdf; 
SUBC > binomial n p. 


Practice: Try the following and see what you get. 


MTB > pdf 3; 
SUBC > binomial 5 0.40. 


will give 
K  P(X=K) 
3.00 0.2304 
and 
MTB > cdf; 


SUBC > binomial 5 0.40. 


will give 


BINOMIAL WITH N = 5 P = 0.400000 
K P(X LESS OR = K) 
0 0.0778 
1 0.3370 
2 0.6826 
3 0.9130 
4 0.9898 
5 1.0000 
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Similarly, if we want to calculate the cdf for a normal probability distribution with mean k and 
standard deviation s, use the following comments. 


MTB > cdf x; 
SUBC > normal ks. 


will give P(X < x). 


Practice: Try the following. 


MTB > cdf 4.20; 
SUBC > normal 4 2. 


We can use the invcdf command to find the inverse cdf. For a given probability p, P(X < x) = F(x) = 
p, we can find x for a given distribution. For example, for a normal probability distribution with 
mean k and standard deviation s, use the following. 


MTB > invcdf p; 
SUBC > normal ks. 


We can also use the pull-down menus to compute the probabilities. The following example illustrates 
this for a binomial probability distribution. 


—_—_e))y_ e_aaee=eee=eeeeeeeeeeeeeeee 
Example 3.7.1 
A manufacturer of a color printer claims that only 5% of their printers require repairs within the first year. If 
out of a random sample of 18 of their printers, four required repairs within the first year, does this tend to 
refute or support the manufacturer's claim? Use Minitab. 


Solution 
Type the numbers 1 through 18 in C1. Then 


Calc > Probability Distributions > Binomial. . . > choose Cumulative probability > in Number of 
trials, enter 18 and in Probability of success, enter 0.05 > in Input column: type C7 > Click OK 


We will get the following output. 
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Cumulative Distribution Function 
Binomial with n=18 and p=0.0500000 
x P(X<=x) 
1.00 0.7735 
2.00 0.9419 
3.00 0.9891 
4.00 0.9985 
5.00 0.9998 
6.00 1.0000 
7.00 1.0000 
8.00 1.0000 
9.00 1.0000 
0.00 1.0000 
1.00 1.0000 
2.00 1.0000 
3.00 1.0000 
4.00 1.0000 
5.00 1.0000 
6.00 1.0000 
7.00 1.0000 
8.00 1.0000 


The required probability is P(X>4)=1— P(X < 3)=1-— 0.9891 =0.0109. 


3.7.2 SPSS Examples 


| nn 
Example 3.7.2 
For the data of Example 3.7.1, using SPSS, find P(X < 3). 


Solution 
Enter numbers 7 through 18 in C1. Then use the following. 


Transform > Compute > type in the Target Variable: y > Use the scroll bar beside the Functions 
box to find CDF.BINOM(q, n, p) > Highlight it and use the up button to load it into the Numeric 
Expression: box. Set q to 3 (success, the x-value), n to 78 (total trials) and p to 0.05 (probability of 
success) > OK 


In the second column, we will get the y-values as 0.99. Hence, P(X < 3) = 0.99. 


We can use this procedure for many other distributions. 
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3.7.3 SAS Examples 


Sometimes, we can use computer calculations to find out the exact probability of a certain event in 
lieu of approximations. For example, when n is large in a binomial experiment, we can use normal 
approximation to calculate the probabilities. The following example shows how to calculate binomial 
probabilities using SAS codes. 


—X—X—!!?::°uo.°0:::::?°>X 0.0 ''_Q———_:?c”*™“ 

Example 3.7.3 
Suppose that a certain drug to treat a disease has a success rate of p = 0.65. This drug is given ton = 500 
patients with the disease. 

(a) What is the probability that 335 or fewer show improvement? 

(b) What is the probability that more than 320 show improvement? 

(c) What is the probability that exactly 300 show improvement? 

(d) What is the probability that the number of improvements lies in the interval (300,350)? 


Solution 
Let X =number of patients showing improvement. Then X is a binomial random variable with parameters 
n = 500 and p = 0.65. 
(a) First three lines in the following code are comment lines. In general, it is always helpful to include 
the comment lines to explain about the program. 


/*This program can be used to compute probability*/ 
/* that a Binomial variable with parameters p*/ 
/*and n is less than or equal to x*/ 
data binomial; 
jo=0) 165 5 
n=500; 
X= 395 3 
y=probbnml(p,n,x); 
cards; 
‘o)rekouns Olan Nana 
run; 


The following is the SAS output from running the foregoing program. 


Obs p n x y 
1 | 0.65 | 500 335 | 0.83753 


Here y= 0.83753 is the P(X < 335). 
(b) To calculate P(X > 320), we can use the following. 


data binomial; 
p=0.65; 
n=500; 
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x= 320% 
y=probbnml(p,n,x); 
Za l=\e 

eairals 2 

proc print; 

run; 


The following is the SAS output from running the foregoing program, where the value of z is the probability 


we are looking for. 


Obs p n x y Zz 
1 | 0.65 | 500 320 | 0.33516 | 0.66484 


Hence, P(X > 320) = 0.66484. 
(c) To find P(X = 300), we can use the following. 


data binomial; 
p=0 65% 
i= S100) 
xI=S00s 
yl=probbnml(p,n,x1); 
Me=Z29e 
y2=probbnml(p,n,x2); 
z=yl—-y2; 

ealralS 3 

proc print; 

run; 


The following is the SAS output from running the foregoing program, where the value of z is the probability 


we are looking for. 


Obs] p n | xl yl x2 y2 Z 
1 |0.65)500/300/0.011327 | 299}].008864418].002462253 


(d) To find P(300 < X <350), use the following. 


data binomial; 
=O 65 
n=500; 
xI=S005 
yl=probbnml(p,n,x1); 
xX2=349 ; 
y2=probbnml(p,n,x2); 
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z=y2-yl; 
cards; 
proc print; 
run; 


We will get the following output. 


Obs p n xl yl x2 y2 Z 
1 /0.65/500/300 |0.011327 | 349|0.98982|0.97849 


Hence, P(300 < X < 350) = 0.97849. 


Similar procedures could be used to calculate probabilities for other distributions. 


In order to test for normality of a given data set using a normal probability plot, we can use PROC 
UNIVARIATE (see Chapter 1 for explanation) in the following manner. Normal plot is called qqplot 
in SAS. 


proc univariate data=K noprint; /*Specify the name of data set as K*/ 
qqplot standard; 

run; 

lO intes 


Note that this avoids printing of all the standard output due to the univariate command, and we get 
only the QQ plot. If we need a straight line in the plot, we can modify the commands as follows. 


proc univariate data=K noprint; /*Specify the name of data set as B*/ 
qqplot standard/ normal (mu=m, sigma=s); 

run; 

quit; 


PROJECTS FOR CHAPTER 3 
3A. Mixture Distribution 


In statistical modeling, if the data are contaminated by outliers or if the samples are drawn from a 
population formed by a mixture of two populations, one could use mixture distributions. Mixture 
distributions are used frequently in medical applications, such as micro array analysis. Suppose a 
random variable X has pdf f;(x) with probability p, and pdf f2(x) with probability p2, where 
Pi + p2 = 1. Then we say that the r.v. X has a mixture distribution. This can be thought of as observing 
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a Bernoulli random variable Z that is equal to 1 with probability p; and 2 with probability po. 
Thus, 


X1~ fi@), ifY=1, 
Xo ~ fa(x), ifY = 2. 


(a) Show that the pdf of X is given by f(x) = pi fi (x) + po f2(x). 
(b) If (1, 02) and (2, 03) are means and variances of f;(x) and f(x), respectively, show 
that 


b= E(X) = pie + p2e2, 
and 


o? = Var(X) = pio + prot + pile, + prey — (pie + p22). 


3B. Generating Samples from Exponential and Poisson Probability 
Distribution 
(a) Generate a sample from je~*/? (6 is chosen). Let Y;, Y2,..., Y, be a sample from a U(0, 1) 
distribution. Let F(x) = 1—e~*/° (cdf of exponential). Then Y = F(x) is uniform. y; = 1—e-*/? 
implies x; = — @In(1 — y;)= — O|nu;, where uj, u2,...., un is a sample from U(0, 1). Then 
X,,...,X, is a sample from an exponential distribution with parameter 0. 
(b) Suppose we want to generate a sample from a Poisson probability distribution with parameter 
A. X1,..., Xn is a sample from an exponential distribution with parameter 1/A till )7", Xi 
just exceeds 1. Then y,(n — 1) is a sample values form a Poisson probability distribution with 
parameter i. 


EXERCISE 3B 


Let u1,u2,..., Uy, be asample from U(0, 1). Show that 


(i) X =—2 Inu) ~ x5, 
i=1 


(ii) X =—B 3 In(u;) ~ gamma(qa, B), and 
i=1 


(iii) X= = ~ Beta(a, B). 


3C. Coupon Collector’s Problem 

Suppose there are n distinct colors of coupons. Each color of coupon is equally likely to occur. When 
a complete set of coupons with each color represented is assembled, you win a prize. Let X = # 
coupons for a complete set. Find (a) Distribution of X, (b) E(X), and (c) Var(X). 
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3D. Recursive Calculation of Binomial and Poisson Probabilities 
A simple way to calculate binomial probabilities is as follows: For a given n and p, evaluate b(0, n, p) 
and then apply the recursive relationship 


p(n — x) 


b x+ 1 n,p)= b X,n, D)-——— 
to obtain other binomial probabilities. 


(a) Derive this recursion formula. 
(b) Forn=15, p=0.4, using the recursive formula, compute all other probabilities starting from 
x=0. 


The following recursive formulas are very useful in calculating successive Poisson probabilities: 
x 
fa-l1aAay= IAS 


and 


e teri r 
f@t+1A) = “aD FO 
For example, if A = 2.5, we know that f(0, 2.5) =e7*> = 0.08208. Using this, calculate (c) f(1, 2.5) 
and f(2, 2.5). 


Chapter 


Sampling Distributions 


Objective: In this chapter we study the probability distributions of various sample statistics such as 
the sample mean and the sample variance and illustrate their usefulness. 
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Abraham de Moivre 
(Source: http://en. wikipedia. org/wiki/File:Abraham_de_Moivre.jpg) 


Abraham de Moivre (1667-1754) was a French mathematician known for his work on the normal 
distribution and probability theory. He is famous for de Moivre’s formula, which links complex 
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numbers and trigonometry. He fled France and went to England to escape the persecution of Protes- 
tants. In England he wrote a book on probability theory, titled The Doctrine of Chances. This book 
was very popular among gamblers. The normal distribution was first introduced by de Moivre in an 
article in 1733 in the context of approximating certain binomial distributions for large n, and is now 
called the theorem of de Moivre—-Laplace. 


4.1 INTRODUCTION 


Sampling distributions play a very important role in statistical analysis and decision making. We 
begin with studying the distribution of a statistic computed from a random sample. Based on 
the probabilistic foundation of Chapters 2 and 3, the present study marks the beginning of our 
learning of statistics beyond the descriptive phase. Because a sample is a set of random variables 
X\,...,Xn, it follows that a sample statistic that is a function of the sample is also random. We 
call the probability distribution of a sample statistic its sampling distribution. Sampling distributions 
provide the link between probability theory and statistical inference. The ability to determine the 
distribution of a statistic is a critical part in the construction and evaluation of statistical proce- 
dures. It is important to observe that there is a difference between the distribution of population 
from which the sample was taken and the distribution of the sample statistic. In general, a pop- 
ulation has a distribution called a population distribution, which is usually unknown, whereas a 
statistic has a sampling distribution, which is usually different from the population distribution. 
The sampling distribution of a statistic provides a theoretical model of the relative frequency histogram 
for the likely values of the statistic that one would observe through repeated sampling. Even though 
some of the terms in this section have already been defined in Chapter 1, we now present these 
definitions in terms of random variables. These abstractions are introduced to develop scientifi- 
cally based methods of analyzing the data, and one should always keep in mind the underlying 
population. 


Definition 4.1.1 A sample is a set of observable random variables X,,..., Xn. The number n is called the 
sample size. 


In most of the inferential procedures that we study in this book, we are dealing with random samples. 
We call the random variables X,,..., X, identically distributed if every X; has the same probability 
distribution. 


Definition 4.1.2 A random sample of size n from a population is a set of n independent and identically 
distributed (iid) observable random variables X1,..., Xn. 


Note that in a sample (not a random sample), X;s need not be independent or identically distributed. 
For the results of this book to be applicable, it is important to ensure that the selection of a sample is at 
least approximately random. The significance of random sampling is that the probability distribution 
of a statistic can be easily derived. Random sampling helps us to control systematic basis. For a finite 
population, one can serially number the elements of the population and then select a random sample 
with the help of a table of random digits. One of the simplest ways to select a random sample of 
finite size is to use a table of random numbers. When the population size is very large, such a method 
can become very taxing and sometimes practically impossible. However, there are excellent computer 
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programs for generating random samples from large populations, and these programs can be used. 
Now we define a statistic. 


Definition 4.1.3 A function T of observable random variables X,,..., Xn that does not depend on any 
unknown parameters is called a statistic. 


The sample mean X = (1/n) )-_, Xj is a function of X1,..., X,. The sample median and sample 
variance S? are also examples of statistics. It is important to observe that even with random sampling, 
there is sampling variability or error. That is, if we select different samples from the same popu- 
lation, a statistic will take different values in different samples. Thus, a sample statistic is a random 
variable, and hence it has a probability distribution. In order for us to study the behavior of the 
phenomenon a sample statistic represents, we must identify its probability distribution. 


Definition 4.1.4 The probability distribution of a sample statistic is called the sampling distribution. 


We can illustrate these definitions with the following example with a finite population and a finite 
sample size. In this case, we take all possible samples of size n from a population of size N. 


| 
Example 4.1.1 
Let the population consist of the numbers {1, 2, 3, 4, 5}. Consider all possible samples consisting of three 
numbers randomly chosen without replacement from this population. Obtain the distribution of the sample 
mean. 


Solution . 
Disregarding the order, it is clear that there are (; = 10 equally likely possible samples of size 3. They are 


(1,2,3), (1,2,4), (1,2,5), (1,3,4), (1,3,5), (1,4,5), (2,3,4), (2,3,5), (2,4,5), and (3,4,5). Calculating the mean, X, for 
each of the samples, we will get the sampling distribution of X as 


2 7 8 3 10} ll | 4 
1 3. 3 1 3 3 1 
1 1 2 2 2 1 1 

10 |} 10 |} 10) 10 | 10 | 10 | 10 


=| 


p (x) 


For example, in the table, P (X = 8/3) = 2/10 because the two samples (1,2,5) and (1,3,4) both give an 
X = 8/3, which is an estimate of the population mean, kL. 


In general, sampling distributions are theoretical distributions that consist of possibly an infinite number of 
sample statistics taken from an infinite number of randomly selected samples of a fixed sample size. For 
example, if a sample of size n = 30 were taken from a large population an infinite number of times, the 
combined means taken from all the samples would make up the sampling distribution of the mean. Every 
sample statistic has a sampling distribution. The next result states that if one selects a random sample from 
a population with mean yt and variance o2, then regardless of the form of the population distribution, one 
can obtain the mean and standard deviation of the statistic X in terms of the mean and standard deviation 
of the population. This is explained in the following result. 

= 
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Theorem 4.1.1 Let X1,..., Xn be a random sample of size n from a population with mean x and variance 
o?. Then E(X) = w and Var(X) = o7 /n. 


Proof. The mean and variance of X is given by, 


E(X) = e(? yx) =~) EUK) 
i=1 i=1 


ieee 1 
=-)lu=—n=uy. 
n 4 n 
i=1 
and 


Var(X) = vor(? 3 x) 


1 n 
oi > Var(X;) (because x; s are independent and 
n 
i=1 


Var(aX;) = a’ Var (X;)) 


We denote E (X) = py and Var (X) = ot. Note that from the previous theorem, wy = jp and 
ox = o/,/n. Here, ox is called the standard error of the mean. It is important to notice that the 
variance of each of the random variables X1, X2,..., X, is o?, whereas the variance of the sample 
mean X is o?/n, which is smaller than the population variance o? forn > 2. 


The implication of Theorem 4.1.1 is that the sample means become more and more reliable as an 
estimate of jz as the sample size is increased, as we would expect. From Chebyshev’s inequality, 


1 
ke 


P (|X —py| <koy) > 1 
Let ¢ = (ko/,/n). Then k = (e,/n)/o. Since wx = pu, the above inequality can be written as 


ra, o2 

P(X - >1l-—. 

(F-n| <9) 21-5 

Thus, for any ¢ > 0, the probability that the difference between X and yu less than ¢ can be made 
arbitrarily close to 1 by choosing the sample size n is sufficiently large. We illustrate this result in the 
following example. 


—e—e—e—e—enenenrerererereeeeeeeeeeeeeeeee nn _—_eeeeeS—0eeeee- 
Example 4.1.2 
A particular brand of drink has an average of 12 ounces per can. As a result of randomness, there will be 
small variations in how much liquid each bottle really contains. It has been observed that the amount of 
liquid in these bottles is normally distributed with o = 0.8 ounce. A sample of 10 bottles of this brand of 


4.1 Introduction 187 


soda is randomly selected from a large lot of bottles, and the amount of liquid, in ounces, is measured in 
each. Find the probability that the sample mean will be within 0.5 ounce of 12 ounces. 


Solution 

Let X1, X2,..., X19 denote the ounces of liquid measured for each of the bottles. We know that Xj;s are 
normally distributed with mean fw = 12 and variance o? = 0.64. From Theorem 4.1.1, X possesses a 
normal distribution (actually, for the normality part, we use Corollary 4.2.2) with a mean 12 and variance 
o* /n = 0.64/10 = 0.064. We find 


P(|X -12| < 0.5) = P(—0.5 < (X — 12) < 0.5) 
( 05. X12. 05 
—P <i < 
o/J/n ~ of//n ~ o//n 


0.5 0.5 
=P <zs< 
0.253 0.253 


= P(-1.97 <z7< 1.97) 


= 0.9512. (using standard normal table) . 


Hence, the chance is about 0.95% that the mean amount of drink in any 10 bottles randomly chosen will be 


between 11.5 to 12.5 ounces. 
[is 


4.1.1 Finite Population 


Let {c1, c2,..., cn} be a finite population. Then the population mean « = (1/N) ae c; and the 
population variance o* = (1/N) ae (c; — w)?. The following theorem for the sample mean and 
variance is stated without proof. 


Theorem 4.1.2 If X1,...,Xn is a sample of size n (chosen without replacement) from a population 
{c1,C2,..., Cn}, then 


E(xX)=u 
Var(X) == (Na). 


n 


We remark here that the sample in the theorem is not a random sample and Xj;s are not 
iid random variables. The factor (N —1n)/(N — 1) in the foregoing theorem is often called the 
finite population correction factor. It is close to 1 unless the sample amounts to a significant 
portion of the population. Note that the sampling without replacement causes dependence 
among the X;s. However, if the sample size n is small relative to the population size N, 
the population correction factor is approximately 1. Hence, we will not use the finite pop- 
ulation correlation factor in the derivation of sampling distribution, unless it is absolutely 
necessary. 
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Oooo 
Example 4.1.3 
Obtain the mean and variance of X in Example 4.1.1. 


Solution 

First note that for the population in Example 4.1.1, the population mean is 4 = (1/N) pean cj = 3 and the 
population variance is 02 = (1/N) ar (cj — 4)? = 2. Applying the probability distribution of X given in 
Example 4.3.1, we obtain 


eovn2($)+2(5)4(8)9(8) (8) 
30) Gi) 


| 


and 


ps 


2G2)-CY EY) eC) 


This is the same as (a2/n).[(N —n)/(N — 1)]. In this case we observe that the variance of X is precisely 
one sixth of the original variance. 
= 


oun 


Example 4.1.4 
Let X,,..., X, bea random sample from a population with mean yu and variance o2. Consider the sample 
variance 


Show that E(S*) = 02. 


Solution 
It can be shown that (see Exercise 1.5.8) 


4.1 Introduction 189 


Hence, 


ll 
a 
~ 
|]s 
= 
| 
= 
| JR 
= 
n——"" 
Q 
iS) 
+ 
a ™~ 
= 
|] 
= 
| 
= 
|| 3s 
= 
———"" 
= 
i) 


This shows that the expected value of the sample variance is the same as the variance of the population 
under consideration. 


= 
EXERCISES 4.1 


4.1.1. Let the population be given by the numbers {—2, —1, 0, 1, 2}. Take all random samples of 
size 3. 
(a) Without replacement, obtain the following in each case. 
(i) The sampling distribution of the sample mean. 

(ii) The sampling distribution of the sample median. 

(iii) The sampling distribution of the sample standard deviation. 

(iv) The mean and variance of the sample mean. 
(b) How many samples of size 3 can we get, if we sample with replacement? 


4.1.2. (a) How many different samples of size n = 2 can be chosen from a finite population of 
size 12 if the sampling is without replacement? 
(b) What is the probability of each sample in part (a), if each sample of size 2 is equally 
likely? 
(c) Find the value of the finite population correction factor. 


4.1.3. Let the population be given by {1, 2, 3}. Let p(x) = 1/3 for x = 1, 2, 3. Take samples of size 
3 with replacement. 
(a) Calculate w and o?. 
(b) Obtain the sampling distribution of the sample mean. 
(c) Obtain the mean and variance of the sample mean. 


4.1.4. Find the value of the finite population correlation factor for 
(a) n=8and N= 60. 
(b) n = 8 and N = 1000. 
(c) n= 15 and N = 60. 


190 CHAPTER 4 Sampling Distributions 


4.1.5. Forarandom sample X1,..., Xn, let (S’)? = (1/n) Ly (Xi — xy. Find E{(s')’1. Compare 
this with E (S*). 


n 
4.1.6. Forarandom sample X,,..., X, with mean yp and variance o?, let T, = >> X;, the sample 
i=l 
total. Show that E (T,) = nu and Var (T,) = no. 


4.1.7. A particular brand of sugar is sold in 5-lb packages. The weight of sugar in these packages 
can be assumed to be normally distributed with mean « = 5 lb and standard deviation 
o = 2\b. What is the probability that the mean weight of sugar in 15 randomly selected 
packages will be within 0.2 lb of 5 lb? 


4.1.8. Arandom sample of size 150 is taken from an infinite population having the mean yw = 15 
and standard deviation o = 2.5. What is the probability that X will be between 10.5 and 
18.5? 


4.1.9. The distribution of heights of all students in a large university has a normal distribution 
with a mean of 66 inches and a standard deviation of 2 inches. What is the probability that 
the mean height of 26 randomly selected students from this university will be more than 
70 inches? 


4.1.10. An image-encoding algorithm, when used to encode images of a certain size, uses a mean 
of 110 milliseconds with a standard deviation of 15 milliseconds. What is the probability 
that the mean time (in milliseconds) for encoding 50 randomly selected images of this size 
will be between 90 milliseconds and 135 milliseconds? What assumptions do we need to 
make? 


4.1.11. In order to evaluate a new release of a database management system, a database admin- 
istrator runs a benchmark program several times and measures the time to completion in 
seconds. Assuming that the distribution of times is normal with mean 95 seconds and with 
standard deviation of 10 seconds, what proportion of measurement times will fall below 85 
seconds? 


4.1.12. A population of disk drives manufactured by a certain company runs with mean seek time of 
10 milliseconds with standard deviation of 0.1 milliseconds. What proportion of samples of 
size 250 would you expect to result in a mean less than 9 milliseconds? What assumptions 
do we need to make? 


4.1.13. Suppose that the national norm of a science test for 12th graders on a particular year has a 

mean of 215 and a standard deviation of 35. 

(a) Arandom sample of 55 12th graders is selected. What is the probability that this group 
will average more than 230? 

(b) Arandom sample of 200 12th graders is selected. What is the probability that this group 
will average over 230? 

(c) Arandom sample of 35 12th graders is selected. What is the probability that this group 
will average over 230? 

(d) How does the sample size influence the probability? 
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4.1.14. Scores on the Wechsler Adult Intelligence Scale for the 20 to 34 age group are approximately 
normally distributed with mean equal to 110 and standard deviation equal to 25. If we select 
100 people at random, what is the probability that this group will have an average score of 
125 or above? 


4.1.15. Itis known that a healthy human body has an average temperature of 98.6°E with a standard 
deviation of 0.95°F, Sixty healthy humans are selected at random. What is the probability 
that their temperatures average at least 99.1°F? 


4.2 SAMPLING DISTRIBUTIONS ASSOCIATED WITH NORMAL POPULATIONS 


The sampling distribution of a statistic will depend upon the population distribution from which 
the samples are taken. In this section we discuss the sampling distributions of some statistics that 
are based on a random sample drawn from a normal distribution. These statistics are used in many 
statistical procedures that are very important in solving real-world problems. The following result 
establishes the distribution of a linear combination of independent normal random variables. 


Theorem 4.2.1 Let X,,..., X, be independent random variables with the distribution of X; being normal 
with mean 4; and variance ae: Let ay, a2,..., Gy be real constants. Then the distribution of Y = °"_, a;X;j 
is normal with mean wy = Y~_, aie; and variance of = )~y_, a?o?. 


Proof. The moment-generating function of Y is given by 
My (t) = Een GXi)t 


= I] Ee)" [hy independence of X's] 
i 
_ Il EeG) Xi 
i 
= I], Mx;,(ait) [using the definition of mgf) 


= I], el@iit+(1/2)a707 17) [using mef of a normal] 


= el Qj aimit+C/2)( 140; )t7] 


which is the mgf of a normal random variable with mean )>; aj; and variance >; a?0?. 
In Theorem 4.2.1 let aj = 1/n, w; = , and of = o7, we obtain the following result, which provides 
the distribution of the sample mean. 


Corollary 4.2.2 Let X1,..., Xn be a random sample of size n from a normal population with mean js and 
variance o*. Then 


Fan) 4% 


is normally distributed with mean wx = ye and variance 0% = 07 /n. 


192 CHAPTER4 Sampling Distributions 


Recall that we have used the notation X ~ N(, 0”) to mean that the random variable X is normally 
distributed with mean pw and variance o*. From Corollary 4.2.2, X ~ N(u,07/n) and hence by the 
z-transformation we obtain the standard normal random variable, Z = (X — 2) / (o/./n) ~ N(O, 1). 


ww 
Example 4.2.1 
A company that manufactures cars claims that the gas mileage for its new line of hybrid cars, on the average, 
is 60 miles per gallon with a standard deviation of 4 miles per gallon. A random sample of 16 cars yielded a 
mean of 57 miles per gallon. If the company’s claim is correct, what is the probability that the sample mean 
is less than or equal to 57 miles per gallon? Comment on the company’s claim about the mean gas mileage 
per gallon of its cars. What assumptions did you make? 


Solution 
Let X represent the gas mileage for the new car (in miles per gallon). If the company’s claim is true, then 
from Corollary 4.2.2, X is normally distributed with mean = 60 and variance o2/n = 16/16 = 1. Hence, 


re s7)= (75% : 7-6) 


1 ~ 1 


= P(Z < —3) ¥ 1—0.999 


Therefore, if the company’s claim is correct, it is very unlikely that the mean value of the random sample of 
16 cars will be 57 miles per gallon. Because the mean is indeed 57 miles per gallon, we conclude that the 
company’s claim is very likely not true. Here we have assumed that the sample of 16 measurements comes 
from a normal population, so that we could apply the results of Corollary 4.2.2. 

= 


Now we introduce some distributions that can be derived from a normal distribution. These 
distributions play a very important role in inferential problems. 


4.2.1 Chi-Square Distribution 


A chi-square distribution is used in many inferential problems, for example, in inferential problems 
dealing with the variance. Recall that the chi-square distribution is a special case of a gamma distri- 
bution with a = n/2 and f = 2. Ifn is a positive integer, then the parameter n is called the degrees of 
freedom. However, ifn is not an integer, but 8 = 2, we still refer to this distribution as a chi-square. 
The mef of a x*— random variable is M(t) = (1 — 2t)~"/*. The mean and variance of a chi-square 
distribution are 4 = n and o? = 2n, respectively. That is, the mean of a y?(n) random variable is 
equal to its degree of freedom and the variance is twice the degree of freedom. We now give some 
useful results for x*— random variables. 


Theorem 4.2.3 Let X1,..., X, be independent x?— random variables with n,, ..., nx degrees of freedom, 
respectively. Then the sum V = ea X; is chi-square distributed with nj + nz +--+: + nx degrees of 
freedom. 
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Proof. The mgf of V is 


k — 
My (t) = I] d- any i/2 = (1-28) (2 


i=1 


This implies that V ~ x? (os; ni). 
Our next result states that the difference of two chi-square random variables is a chi-square random 
variable, given by the following theorem. The proof is left as an exercise. 


Theorem 4.2.4 Let X, and X2 be independent random variables. Suppose that X, is x? with ny degrees of 
freedom, whereas Y = X, + X2 is chi-square with n degrees of freedom, where n > ny. Then X2 = Y — X1 
is a chi-square random variable with n — n degrees of freedom. 


The following result shows that we can generate a chi-square random variable from a gamma random 
variable. 


Theorem 4.2.5 If a random variable X has a gamma distribution with parameters a and B, then 


cau 2 (2a) 
= 2 x . 


Proof. Recall that the mgf of the gamma random variable X is (1 — Br)~®. 


2X; 
My(t) = Max (0 = E (« B ) 


- (et) a3) 


—oaj = Coys, 


Hence, Y ~ x7(2a). 


The following result states that by squaring a standard normal random variable, we can generate a 
chi-square random variable, with one degree of freedom. 


Theorem 4.2.6 If X is a standard normal random variable, then X? is chi-square random variable with 1 d.f. 


Proof. Because X ~ N(0, 1) the moment-generating function of X? is 


Co 
ae | 2 
My2(t) = / ee * /2gy = (1 — 21) 1/2, 
ae V20 


This implies that X? ~ x?(1). Figure 4.1 gives the probability densities of the random variables X 
and X?. 
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Densities of Standard normal r.v. and its square 
4 T T T T 


W@ FIGURE 4.1 pdf of standard normal r.v. and the pdf of its square. 


The following result is a direct consequence of Theorems 4.2.3 and 4.2.6. This result illustrates how to 
obtain a random sample from chi-square distribution ifwe have a random sample of n measurements 
from a normal population. 


Theorem 4.2.7 Let the random sample X,,..., Xp be from a N(, o*) distributed. Then Z; = (X; — )/ 
o,i=1,...,n are independent standard normal random variables and 


n n : 2 
year (A-*) 
i=1 


i=1 
has a x*-distribution with n degrees of freedom. In particular, if X,,..., Xp, are independent standard normal 
random variables, then Y? = )~"_, X? is chi-square distributed with n degrees of freedom. 


If X ~ x? (n), then from the chi-square table, we can compute the values of x2 (n) such that 
P(x > x (n)) =a, 


as shown by Figure 4.2. 


For example, if X ~ x? (15), to find Yea 5 (15) look in the chi-square table with the row labeled 15 df. 
and the column headed x 959 and obtain the value as 7.26094. Thus, with 15 degrees of freedom, 
P(X > 7.26094) = 0.95. Also, if X is a chi-square random variable with 11 degrees of freedom, from 
the chi-square table we have te (11) = 19.675. Therefore, P(X > 19.675) = 0.05. 
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X?(n) 


Wi FIGURE 4.2 Chi-square probability density. 


oz, 


Example 4.2.2 
Let the random variables X1, X2,..., X5 be froman N (5, 1) distribution. Find a number a such that 


5 
P| >> (&%-5)? <a] =0.90. 
(=1 
Solution 
5 be Ae 5 
By Theorem 4.2.7, ). Z? = » (4) = )\(X; — 5)? has a chi-square distribution with 5 degrees of 


i=1 i=1 i=1 
freedom. Because the upper tail area is 0.10, looking at the chi-square table with 5 d.f. and the column 


corresponding to ey we obtain a = 9.23635. Thus, 


5 
P| >> (X; —5)* < 9.23635 | = 0.90. 
i=1 


3 ?DRHP AA 


Example 4.2.3 
Suppose that X is x2 — random variable with 20 degrees of freedom. Use the chi-square table to obtain 
the following: 

(a) Find xg such that P (X > x9) = 0.95. 

(b) Find P(X < 12.443). 


Solution 
(a) For 20 degrees of freedom, using the chi-square table, we have 


P(X > 10.851) = 0.95. 


Hence, x9 = 10.851. 
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(b) From the chi-square table, 
P(X < 12.443) = 0.10. 


The following result gives the probability distribution for a function of the sample variance S2. 


Theorem 4.2.8 If X),..., X; isa random sample from a normal population with the mean js and variance 
2 
o~, then 


(a) the random variable 


(n — 1) S? 
(b) o2 > o2 : 


has a chi-square distribution with (n — 1) degrees of freedom. 
(c) X and S? are independent. 
Proof. We will only prove part (a). For part (b), we will give some comments on the proof. 


(a) We know from Theorem 4.2.7 that (1/07) )°7_, (X; — 1)? has a chi-square distribution with n 
degrees of freedom. Thus, 


be — —— 
= (Ki w)? = SY (Xi K+ Kw)? 
i=1 i=1 
1 


= [Sei H+ e- 0] 


i=1 


(since 2 (x — X)(X-p) = o 


= 2 
- (n= 1) 8? X—p 
ge ofJn } * 
The left-hand side of this equation has a chi-square distribution with n degrees of freedom. 


Also, since (X — j2) / (o/./n) ~N (0, 1) by Theorem 4.2.6 we have [(X — 2) / (o/. Jn) |” ~ x? (1). 
Now from Theorem 4.2.4, (n — 1) S*/o? ~ x? (n — 1). 


(b) We will accept the result of part (b) without proof here. A rigorous proof depends on 
geometric properties of the multivariate normal distribution, which is beyond the scope 
of this book. A proof based on moment-generating functions is relatively straightforward, 
where essentially we can first show that the random variable X and the vector of ran- 
dom variables (X1 — X,..., Xn — X) are independent. Because S* is a function of the vector 
(X, — X,..., Xn — X), it is then independent of X. 


4.2 Sampling Distributions Associated with Normal Populations 197 


EE sS\x59... ee 
Example 4.2.4 
Let X, X,..., X19 be a random sample from a normal distribution with o2 = 0.8. Find two positive 
numbers a and b such that the sample variance S* satisfies 


P(a <§2< b) = 0.90. 


Solution 


2 
Because oo ~ x2 (n — 1), we have 


o2 a o2 o2 


2 
(ass? <)= (S98 « (@=1)8? op). 


The desired values can be found by setting the upper tail area and lower tail area each equal to 0.05. Using 
the chi-square table with n — 1 = 9 degrees of freedom, we have 


(n—1)b _ 9b . 
5 = Gq = 16-919 = x8.05,9: 


which implies b = ((16.919) x (0.8) /9) = 1.50. Similarly, 


(n—1)a_ 9a 


2 
2 = 08 = 3.325= X0.95,9° 


So we have a = ((3.325) x (0.8) /9) = 0.295. 


Hence, 
P (0.295 < S? < 1.50) = 0.90. 
It is important to note that this is not the only interval that would satisfy 
P(a<s? <b) =0.90 


but it is a convenient one. 
= 


[—e_———_—_—_———_——— 
Example 4.2.5 
A fruit-drink company wants to know the variation, as measured by the standard deviation, of the amount 
of juice in 16-ounce cans. From past experience, it is known that o* = 2. The company statistician decides 
to take a sample of 25 cans from the production line and compute the sample variance. Assuming that the 
sample values may be viewed as a random sample from a normal population, find a value of b such that 
P(S? = b) = 0.05. 
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Solution 
To find the necessary probability, use the fact that (n — 1) S*/o2 ~ x2 (n — 1), with n = 25, 


2482 = 24b 
2 2 


0.05 = P(S* > b) =P ( > 
= P(x? >c). 
From the chi-square table we obtain, c = 36.4151. Hence, b = Fe = 4 (36.4151) = 3.03 and 


P (s? > 3.03) = 0.05. 


SUMMARY OF CHI-SQUARE DISTRIBUTION 
Let X1,...,Xn be iid N(u, 07) random variables. Then 
1. X has N(u,07/n) distribution, 
2. (n — 1)S2/o? has a chi-square distribution with (n — 1) degrees of freedom, and 
3. X and S2 are independent. 
4. A x2— random variable has a mean equal to its degrees of freedom and a variance equal to twice its 
degrees of freedom. 


4.2.2 Student t-Distribution 

Let the random variables X;,..., X, follow a normal distribution with mean j and variance o?. 
If o is known, then we know that \/n ((X — 4) /o) is N (0, 1). However, if o is not known (as is 
usually the case), then it is routinely replaced by the sample standard deviation s. If the sample 
size is large, one could suppose that s © o and apply the Central Limit Theorem and obtain that 
/n ((X — 1) /S) is approximately an N (0, 1). However, if the random sample is small, then the dis- 
tribution of /n ((X — 1) /S) is given by the so-called Student t-distribution (or simply t-distribution). 
This was originally developed by W. S. Gosset in 1908. Because his employers, the Guinness brewery, 
would not permit him to publish this important work in his own name, he used the pseudonym 
“Student.” Thus, the distribution is known as the Student t-distribution. 


Definition 4.2.2 If Y and Z are independent random variables, Y has a chi-square distribution with n 
degrees of freedom, and Z ~ N (0, 1), then 

Z 
VYjn 


is said to have a (Student) t-distribution with n degrees of freedom. We denote this by T ~ Ty. 


The probability density of the random variable T with n degrees of freedom is given by 


1 ntl 
r ($+) (2 
) 1+ 


Z 
,-00 <t <0. 
Jnl (5 | 


fO= 
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T density for n= 2, n=10, n= 20, n= 30 


0.4- 


0.35 


0.3 


0.25 


0.15 + 


0.1 


0.05 - 


4-5 2 =) © 1 2 FF 4 


W@ FIGURE 4.3 The Student ¢-distribution. 


Figure 4.3 illustrates the behavior of the ¢-distributions for n =2, 10, 20, and 30. It is clear from 
Figure 4.3 that as n becomes larger and larger, it is almost impossible to distinguish the graphs. It can 
be shown that the t-distribution tends to a standard normal distribution as the degrees of freedom 
(equivalently, the sample size n) tend to infinity. In fact, the standard normal distribution provides a good 
approximation to the t-distribution for sample sizes of 30 or more. We will use this approximation in the 
statistical inference problems for n > 30. 


The t-density is symmetric about zero, and then we have E (T) = 0. If n > 2, it can be shown that 
Var (T) = n/(n — 2). The value of tg,, is such that P (t > tn) = a (the shaded area in Figure 4.4) is 
obtained from the t-table. For example, if a random variable X has a t-distribution with 9 degrees of 
freedom and @ = 0.01, then f9,91,9 = 2.821. 


If we have a random sample from a normal population, the following result involving a t-distribution 
is useful in applications. 


Theorem 4.2.9 If X and S* are the mean and the variance of a random sample of size n from a normal 
population with the mean yw and variance o?, then 


bs 
T = 
S/J/n 
has a t-distribution with (n—1) degrees of freedom. 
Proof. By Corollary 4.2.2, 
x 
= 2 #~Nv@,1). 
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f(t) 


0.35 F 


0.25 F 


0.15 F 


0.05 F 


-4 -3 -2 -1 Oo 1 2 3 4 


Wi FIGURE 4.4 Probability of t-distribution. 


By Theorem 4.2.8, we have 


n 


(n — 1) S2 1 —2 9 
Y= = xX,;-X)~ —1). 
Hence, 
X= 
rei) 


(n—1)S2 x2(n—1) 
(oes V n—1 


Also, X and S? are independent. Thus, Y and Z are independent, and by Definition 4.2.2, T follows 
a t-distribution with (n — 1) degrees of freedom. 


How can we distinguish between given degrees of freedom and the degrees of freedom from a sample? 
For the t-distribution, ifn is given as the degrees of freedom, we will just use n. However, if a random 
sample of size n is given, then the corresponding degrees of freedom will be (n — 1), as given in 
Theorem 4.2.9. 


The assumption that the sample comes from a normal population is not that onerous. In practice, it 
is necessary to check that the sampled population is approximately bell shaped and not too much 
skewed. Construction of the normal-scores plot or histogram is a way to check for approximate 
normality. See Project 4C. 
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Example 4.2.6 
A manufacturer of fuses claims that with 20% overload, the fuses will blow in less than 10 minutes on the 
average. To test this claim, a random sample of 20 of these fuses was subjected to a 20% overload, and the 
times it took them to blow had the mean of 10.4 minutes and a sample standard deviation of 1.6 minutes. 
It can be assumed that the data constitute a random sample from a normal population. Do they tend to 
support or refute the manufacturer's claim? 


Solution 
Given y = 10.4, s = 1.6, n = 20, and yc = 10. Hence 


_y-m_ 104-10 © 
s/n 1.6//20 
The degree of freedom is n — 1 = 19. From the t-table, the probability that t exceeds 1.328 is 0.10, and 


because the observed value of t = 1.118 is less than to,19(19) = 1.328 and 0.10 is a pretty large probability, 
we conclude that the data tend to agree with the manufacturer's claim. 


1.118. 


We will study the problems of the foregoing type in Chapter 7, where we will be learning about 
hypothesis testing. Prior to Student's work on the t-distribution, a very large number of observations 
were necessary for design and analysis of experiments. Today, the use of the t-distribution often 
makes it possible to draw reliable conclusions from samples as small as 15 to 30 experimental units, 
provided that the samples are representative of their populations and that normality could reasonably 
be assumed or justified for the population. 


.000000000.)?:°. 0.999 SSS..=—See..._2 a“ 
Example 4.2.7 
The human gestation period—the period of time between conception and labor—is approximately 40 
weeks (280 days), measured from the first day of the mother’s last menstrual period. For a newborn full- 
term infant, the length appropriate for gestational age is assumed to be normally distributed with = 50 
centimeters and o = 1.25 centimeters. Compute the probability that a random sample of 20 infants born 
at full term results in a sample mean greater than 52.5 centimeters. 


Solution 
Let X be length (measured in centimeters) of a newborn full-term infant. Then X~N (50, 1.56/20). Hence 


52.5 — 50 


P(E > 525) =P (> 


= 8.94) ~ 0. 


Thus, the probability of such an occurrence is negligible. 
[iss] 
In the previous example, it should be noted that P (X > 52.5) © 0 does not imply that the probability 


of observing a newborn full-term infant with length greater than 52.5 centimeters is zero. In fact, with 
19 degrees of freedom, P(X > 52.5) = P(t > 2) © 0.025. 
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4.2.3 F-Distribution 


The F-distribution was developed by Fisher to study the behavior of two variances from random sam- 
ples taken from two independent normal populations. In applied problems we may be interested 
in knowing whether the population variances are equal or not, based on the response of the ran- 
dom samples. Knowing the answer to such a question is also important in selecting the appropriate 
statistical methods to study their true means. 


Definition 4.2.3 Let U and V be chi-square random variables with n1 and n2 degrees of freedom, respectively. 
Then if U and V are independent, 


F= U/n, 
V/na2 


is said to have an F-distribution with n, numerator degrees of freedom and nz denominator degrees of 
freedom. We denote this by F ~ F (nq, n2). 


The pdf for a random variable X ~ F (nj, nz) is given by 


T'((m1 +2)/2) m/2 m_4 —(nytnz)/2 
fe) = | Mon D272 ) we (1 a mm) etal 


0, elsewhere. 


A graph of f (x) for various values of n is given in Figure 4.5. 


F—density with n,= 3, no=2, and ny= 12, n2==6 


W@ FIGURE 4.5 pdfs of F-distribution. 
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F, (14,2) 


W@ FIGURE 4.6 F-distribution probability. 
To find Fy (n1, 2) such that P(F > Fy (m1, n2)) = a (shaded area in Figure 4.6), we use the F-table. 
For example, if F has 3 numerator and 6 denominator degrees of freedom, then Fo.) (3, 6) = 9.78. 


If we know Fy (nj, 2), it is possible to find F\_, (n2, n;) by using the identity 
Fy—o (12,11) = 1/Fa (11, 02). 
Using this identity we can obtain Fo.99 (6, 3) = 1/Fo.01 (3, 6) = 1/9.78 = 0.10225. 


When we need to compare the variances of two normal populations, we will use the following result. 


Theorem 4.2.10 Let two independent random samples of size ny and nz be drawn from two normal pop- 


ulations with variances o7, 0%, respectively. If the variances of the random samples are given by S?, S3, 


respectively, then the statistic 


has the F-distribution with (n, — 1) numerator and (nz — 1) denominator degrees of freedom. 
Proof. From Theorem 4.2.9, we know that 


(ny — 1) S? 
U=— + ~ 7m -) 
OF 


and 
(n2 — 1) S2 
Vs ~ 7m - 1). 
5 


Also, U and V are independent. From Definition 4.2.3, F ~ F (n; — 1,2 —1). 
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Corollary 4.2.11 If ae = ae, then 


s2 
1 
F= 7 WF (ny 1,n2 1). 
Sy 
When o; =0%, we refer to them as two populations that are homogeneous with respect to their 


variances. 


——?:— nn — — _c = 
Example 4.2.8 
Let Ss? denote the sample variance for a random sample of size 10 from Population | and let SF denote the 
sample variance for a random sample of size 8 from Population II. The variance of Population | is assumed to 
be three times the variance of Population II. Find two numbers a and b such that P (a < S?/S> <b) = 0.90 
assuming S* to be independent of $5. 


Solution 
From the problem, we can assume that at = 303 with ny = 10 and n2 =8. Thus, we can write 


S/o; 8, 86, _ Se 


Sios. Sele, 3S 


this has F-distribution with ny — 1 = 9 numerator and nz — 1 = 7 denominator degrees of freedom. Using 
the F-table, Fo.95 (9, 7) = 3.68. Now to find Fo.95 such that 


We proceed as follows: 


Indexing vy = 7 and v2 = 9 in the F-table, we have 1/Fo,.95 (7, 9) = 3.29 or Fo.95 = 1/3.29 = 0.304. 
Hence, the entire probability statement is 


s2 s2 
P[{ 0.304 < — < 3.68) = P( 0.912 < + < 11.04] = 0.90. 
352 S3 


Thus, a= 0.912 and b = 11.04. 


EXERCISES 4.2 


4.2.1. Let Y have a chi-square distribution with 15 degrees of freedom. Find the following 
probabilities. 
(a) P(Y < yo) = 0.025 
(b) P(a<Y <b)=0.95 
(c) P(Y = 22.307). 
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4.2.2. Let Y have a chi-square distribution with 7 degrees of freedom. Find the following 
probabilities. 
(a) P(Y > yo) = 0.025 
(b) P(a < Y <b) =0.90 
(c) P(Y > 1.239). 


4.2.3. The time to failure T of a microwave oven has an exponential distribution with pdf 


1 
foO= xe, t>0. 


If three such microwave ovens are chosen and f is the mean of their failure times, find the 
following: 

(a) Distribution of T. 

(b) P(T > 2). 


4.2.4. Let X1, X2,..., X19 be a random sample from a standard normal distribution. Find the 
numbers a and b such that 


10 
P (: a ) = 0.95. 


i=1 
4.2.5. Let X1, X2,..., X5 be a random sample from the normal distribution with mean 55 and 
variance 223. Let 
5 
Y=) > (x; — 55)? /223 
i=1 
and 


5 
Z =~ (Xi —X)* /223. 
i=1 
(a) Find the distribution of the random variables Y and Z. 
(b) Are Y and Z independent? 
(c) Find (i) P(0.62 < Y < 0.76), and (ii) P(0.77 < Z < 0.95). 


4.2.6. Let X and Y be independent chi-square random variables with 14 and 5 degrees of freedom, 
respectively. Find 
(a) P(\X —Y| < 11.15), 
(b) P(\X —Y| >= 3.8). 


4.2.7. A particular type of vacuum-packed coffee packet contains an average of 16 ounces. It has 
been observed that the number of ounces of coffee in these packets is normally distributed 
with o = 1.41 ounce. A random sample of 15 of these coffee packets is selected, and the 
observations are used to calculate s. Find the numbers a and b such that P (a < S? < b) = 
0.90. 


4.2.8. An optical firm buys glass slabs to be ground into lenses, and it is known that the variance 
of the refractive index of the glass slabs is to be no more than 1.04 x 107%. The firm rejects 
a shipment of glass slabs if the sample variance of 16 pieces selected at random exceeds 
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1.15 x 10-3. Assuming that the sample values may be looked on as a random sample from 
a normal population, what is the probability that a shipment will be rejected even though 
a7 = 1.04 1077? 


4.2.9. Assume that T has a t-distribution with 8 degrees of freedom. Find the following 
probabilities. 
(a) P(T < 2.896) 
(b) P(T < —1.860) 
(c) The value of a such that P (—a < T < a) = 0.99 


4.2.10. Assume that T has a ¢-distribution with 15 degrees of freedom. Find the following 
probabilities. 
(a) P(T < 1.341) 
(b) P(T > —2.131) 
(c) The value of a such that P (—a < T < a) = 0.95 


4.2.11. A psychologist claims that the mean age at which female children start walking is 11.4 
months. If 20 randomly selected female children are found to have started walking at a 
mean age of 11.5 months with standard deviation of 2 months, would you agree with the 
psychologist’s claim? Assume that the sample came from a normal population. 


4.2.12. Let U; and U2 be independent random variables. Suppose that U, is x? with v; degrees of 
freedom while U = U, + U> is chi-square with v degrees of freedom, where v > v,. Then 
prove that U2 is chi-square random variable with v — v; degrees of freedom. 


4.2.13. Show that if X ~ x? (v), then EX = vand Var (X) = 2v. 


4.2.14. Let X1,...,X, be a random sample with X;~ x? (1), fori = 1,...,n. Show that the 
distribution of 


—_ xX-1 
J/2/n 
as n — oo is standard normal. 
4.2.15. Find the variance of S?, assuming the sample Xj, X2,..., X» is from N (u, a). 
4.2.16. Let X1, X2,..., Xn be a random sample from an exponential distribution with parameter 


6. Show that the random variable 267! (= xi) ~ x? (2n). 
i=l 


4.2.17. Let X and Y be independent random variables from an exponential distribution with com- 
mon parameter 0 = 1. Show that X/Y has an F-distribution. What is the number for degrees 
of freedom? 


4.2.18. Prove that if X has a t-distribution with n degrees of freedom, then X? ~ F (1, n). 
4.2.19. Let X be F distributed with 9 numerator and 12 denominator degrees of freedom. Find 
(a) P(X < 3.87), 
(b) P(X < 0.196), 
(c) The value of a and b such that P(a < Y < b) = 0.95. 
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4.2.20. Prove that if X ~ F (m1, n2), then 1/X ~ F (n2, 14). 
4.2.21. Find the mean and variance of F (n,,n2) random variable. 


4.2.22. Let X11, X12,..., Xin, bearandom sample with sample mean X, froma normal population 
with mean jj and variance as, and let X21, X22,..., X2n, be a random sample with sample 
mean X> from a normal population with mean j12 and variance 0+. Assume the two samples 
are independent. Show that the sampling distribution of (X; — X2) is normal with mean 
[1 — 2 and variance o7/n; + 03 /n. 


4.2.23. Let X;, X2,..., Xn, bea random sample from a normal population with mean jy; and vari- 
ance o2, and Yj, Yo,..., Y,;, be arandom sample from an independent normal population 
with mean j12 and variance o7. Show that 


(X — Y) — (1 — 112) 
~ Tiny +n2—2) 
4 eae eee ( i a: 1) 


T= 


ny+n2—2 ny n2 


4.2.24. Show that a t-distribution tends to a standard normal distribution as the degrees of freedom 
tend to infinity. 


4.2.25. Show that the mef of a x? random variable is M(t) = (1 — 2r)~"/*. Using the mgf, show 
that the mean and variance of a chi-square distribution are v and 2v, respectively. 


4.2.26. Let the random variables X), X2,..., X19 be normally distributed with mean 8 and variance 
4. Find a number a such that 


10 


pe 
P > (=) <a] =0.95 


i=1 


4.2.27. Let X2 ~ F (1,n). Show that X ~t(n). 


4.3 ORDER STATISTICS 


In practice, the random variables of interest may depend on the relative magnitudes of the observed 
variable. For example, we may be interested in the maximum mileage per gallon of a particular 
class of cars. In this section, we study the behavior of ordering a random sample from a continuous 
distribution. 


Definition 4.3.1 Let X1,..., Xn be a random sample from a continuous distribution with pdf f(x). Let 
Y,,.--, Yn be a permutation of X1,..., Xp such that 


Y1<Yo<---<Mn. 


Then the ordered random variables Y,,...,Y, are called the order statistics of the random sample 
X\,..., Xn. Here Y, is called the kth order statistic. Because of continuity, the equality sign could be 
ignored. 
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Remark. Although X;‘s are iid random variables, the random variables Y;’s are neither independent 
nor identically distributed. 


Thus, the minimum of X;’s is 

Y; = min(Xq,..., Xn) 
and the maximum is 

Y, = max(X1,..., Xn). 


The order statistics of the sample X1, X2,..., X, can also be denoted by X 1), X 2), ..., Xin) where 
Xa) < X02) < +++ < Xm). 


Here X(z) is the kth order statistic and is equal to ¥; in Definition 4.3.1. One of the most com- 
monly used order statistics is the median, the value in the middle position in the sorted order of the 
values. 
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Example 4.3.1 
(i) The range R = Y, — Y, isa function of order statistics. 
(ii) The sample median M equals Y,+1 ifn = 2m 4+ 1. 
Hence, the sample median M is an order statistic, when n is odd. If n is even then the sample median can 
be obtained using the order statistic, M = (1/2) [Yn/2 + Yn/2)+1]- 
[= 


The following result is useful in determining the distribution of functions of more than one order 
statistics. 


Theorem 4.3.1 Let X1,..., X» be a random sample from a population with pdf f(x). Then the joint pdf of 
order statistics Y,,..., Yn is 


al f(y) f02)--.fOn), fory, <--- < yn 
f O11...) = . 
0, otherwise. 


The pdf of the kth order statistic is given by the following theorem. 
Theorem 4.3.2 The pdf of Yi is 


fe) = fy, (”) = fOFoO)1a- Foy", 


n! 
(k—1)!(n—k)! 


for —co < y < ov, where F(y) = P(X; < y) is the cdf of X;. 


In particular, the pdf of Y; is fi(y) = nf(y)[1—F()]"! and the pdf of Y, is fy(y) = 
nf (y) [F (y)]"~ |. In the following example, we will derive pdf for Y,. 
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22° 


Example 4.3.2 
Let X1,..., X, be arandom sample from U [0, 1]. Find the pdf of the kth order statistic Y;. 


Solution 
Since the pdf of X; is f (x) = 1,0 < x < 1, the cdf is F (x) = x,0 < x < 1. Using Theorem 4.3.2, the pdf 
of the kth order statistic Y, reduces to 


ie k-1 n—-k 
fk) = Gabe ey", Ce yet 


which is a beta distribution witha =k and B=n—k-+1. 
= 


The next example gives the so-called extreme (i.e., largest) value distribution, which is the distribution 
of the order statistic Yp. 


ooo, 


Example 4.3.3 
Find the distribution of the nth order statistic Y, of the sample X1,..., Xn from a population with pdf 


Ff (@). 


Solution 
Let the cdf of Y, be denoted by Fy (y). Then 


F() = Pn = ») = P( max x <9) 
1<i<n 
= P(X, <y,...,Xn < y) =[F()]” (by independence). 


Hence, the pdf fn (y) of Yn is 


d n n—-1 d 
fay) = FIFO)" = AlFO)" FO) 
ly dy 
= n[F(y)I"~! f(y). 


In particular, if X,,..., Xn isa random sample from U [0, 1], then the cumulative extreme value distribution 


is given by 


0, y<O 
Frn(yy= yy", O<y<l1 
1, yol. 
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Example 4.3.4 
A string of 10 light bulbs is connected in series, which means that the entire string will not light up if any 


one of the light bulbs fails. Assume that the lifetimes of the bulbs, 71, ..., t19, are independent random 
variables that are exponentially distributed with mean 2. Find the distribution of the life length of this string 
of light bulbs. 

Solution 


Note that the pdf of 1; is f(t) = 2e~?, 0 < t < 00, and the cumulative distribution of t; is Fz, (t) = 1—e7?". 
Let T represent the lifetime of this string of light bulbs. Then, 


T =min(t1,..., 710). 
Thus, 
Fr(t) =1—[1- F;,(9]”°. 
Hence, the density of T is obtained by differentiating Fry (t) with respect to t, that is, 
fr) = Of, (QU — FOP 


_ 2(10)e~ 2! (e~ 24)? = 20e72%, 0 < t < 00 
~ 10, otherwise. 


The joint pdf of the order statistics is given by the following result. 


Theorem 4.3.3 Let X,,...,X, be a random sample with continuous probability density function f (x) 
and a distribution function F(x). Let Y,,...,Y, be the order statistics. Then for any 1 < i < k < nand 
—0o0 <x < y < ©, the joint pdf of Y; and Yx is given by 


n!} 
I!(kK-i-1)!(n—k)! 
x [F(Q) — F@l 1 [1 — FO" £00 £0) 
.3_—_$_$AAS  _ _ 


Example 4.3.5 
Let X1,..., X, be arandom sample from U [0, 1]. Find the joint pdf of Y2 and Ys. 


[F@]} 


SY;,¥, OY) = G 


Solution 
Taking i = 2 and k = 5 in Theorem 4.3.3, we get the joint pdf of Yz and Ys5 as 


n! 


2-1 
Qo DIG 2- Diao I 


SY¥2,¥s (x, y) = 


[FQ —F@P At x [1- FO)" PFW FO) 


- sq (y ed ae O<x<y<l 


0, otherwise. 
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EXERCISES 4.3 


4.3.1. The lifetime X of a certain electrical fuse has the following probability density function 


de 4/10, 


f@=;," 


x>0O 
0, otherwise. 


Suppose two such fuses are in series and operate independently in a system. Find the pdf of 
the lifetime Y of the system. (The system will work only if both the fuses operate.) 


4.3.2. Suppose that time between two telephone calls at an office, in minutes, is uniformly dis- 
tributed on the interval [0, 20]. If there were 15 calls, (i) what is the probability that the 
longest time interval between calls is less than 15 minutes? (ii) What is the probability that 
the shortest time interval between calls is greater than 5 minutes? 


4.3.3. Let X1, X2, X3 be three random variables of discrete type. Let X1, X2 take values 0, 1, and 
X3 take values 1, 2, 3. What are the values of Y;, Yo, Y3? 


4.3.4. Let X1,..., X19 be a random sample from U [0, 1]. Find the joint density of Y2 and ¥7, 


where Y;,i = 1, 2,..., 10 are order statistics of X1,..., X10. 
4.3.5. Let X;,...,X, be arandom sample from exponential distribution with a mean of 6. Show 
that Y; = min (Xj, X2,..., X,) has an exponential distribution with mean 6/n. Also, find 


the pdf of Y, = max (Xj, X2,..., Xn). 


4.3.6. A string of 10 light bulbs is connected in parallel, which means that the entire string will 
fail to light up only if all 10 of the light bulbs fail. Assume that the lifetimes of the bulbs, 
T1,--+, T19, are independent random variables that are exponentially distributed with mean 
0. Find the distribution of the lifetimes of this string of light bulbs. 


4.3.7. Let X1,..., Xn be arandom sample from the uniform distribution f(x) = 1/2,0 <x < 2. 
Find the probability density function for the range R = (X(n) — Xa). 


4.3.8. Given a sample of 25 observations from a distribution with pdf 


e~*, x>0O 
f(x) = 


0, otherwise 


let M be the sample median. Compute P(M > b). 
[Hint: Note that M is the 13th order statistic. | 


4.3.9. Let X1,..., X, be arandom sample from a normal population with mean 10 and variance 
4. What is the probability that the largest observation is greater than 10? 


4.3.10. Let X;,..., X, bea random sample from an exponential population with parameter @. Let 
Y,,..., Y, be the ordered random variables. 
(a) Show that the sampling distributions of Y; and Y,, are given by 


0, otherwise, 
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and 


n—-1 
we on/O [1 — en/8 , ify, >0 


0, otherwise. 


tn On) = | 


(b) Let n = 2/ + 1. Show that the sampling distribution of the median, M, is given by 


f (m) faerie. ae 
m)= ! 


0, otherwise. 


4.3.11. Let X1,..., X, be a random sample from a beta distribution with a = 2 and f = 3. Find 
the joint pdf of Y; and Y,. 


4.3.12. Let X;,...,X, be arandom sample from a geometric distribution with pmf 
Pi= P(X =)=pq|,i= 1,2,....0<p<l,g=1-p. 


Show that 


n 
i. en . ae 
ra=v=D(") 4 DOO grt — PF 1 -@ 4, 
i=k ' 
pe by ee 


4.4 LARGE SAMPLE APPROXIMATIONS 


If the sample size is large, the normality assumption on the underlying population can be relaxed. 
A useful generalization of Corollary 4.2.2 follows. 


Theorem 4.4.1 Suppose that the population (not necessarily normal) from which samples are taken has 
a probability distribution with mean js and variance o*. Then the standardized variable (or z-transform) 
associated with X, given by 

X—w 


o/./n 


is asymptotically standard normal. That is, 


Zz 
. -_ 1 —u?/2 
jim, PZ<d=—— | e du. 

—0oO 


Theorem 4.4.1 follows directly from the Central Limit Theorem. The consequence of this for statistics 
is that, regardless of the form of the population distribution, the distribution of the z-transform of 
a sample mean X will be approximately a standard normal random variable whenever n is large. 
This fact will be used in almost all large sample inference problems. It is important to note that, by 
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Theorem 4.2.2, if the random sample came from a normal population, then sampling distribution of 
the mean is normally distributed regardless of the size of the sample. We could use the foregoing results 
if the population variance o* is known or when the sample size is large. Even though the required 
sample size to apply Theorem 4.4.1 will depend on the particular distribution of the population, for 
practical purposes we will consider the sample size to be large enough ifn > 30. 


—oeoeoerernrerererererererreeeeeeeeeeeee-...—n—n nn — OO ae 
Example 4.4.1 
The average SAT score for freshmen entering a particular university is 1100 with a standard deviation of 
95. What is the probability that the mean SAT score for a random sample of 50 of these freshmen will be 
anywhere from 1075 to 1110? 


Solution 
The distribution of X has the mean My = 1100 and og = 95/V50. By Theorem 4.4.3, 


X~N (1100, ox = 95/90). The z-series corresponding to 1075 and 1110 are z = [(1075 — 1100)/ 
95/50] = —1.8608 and z = [ano = 1100)/95/V50| = 0.74432. 


Hence 
P (1075 < X < 1110) = P(—1.8608 < Z < 0.74432) = 0.739 


means that we are 73.9% certain based on the given data that the mean SAT score is between 1075 and 
1110, inclusive. 
= 


4.4.1 The Normal Approximation to the Binomial Distribution 


We know that a binomial random variable Y, with parameters n and p = P (success), can be viewed 
as the number of successes in n trials and can be written as 


n 
r=) 2; 
i=1 


where, 
as 1 with probability p 
‘lo. with probability (1 — p). 


The fraction of successes in 7 trials is 
il _ 
n ne — 


Hence, Y/n is a sample mean. Since E (X;) = p and Var (X;) = p(1— p), we have 


(=) = e(2 re p 


214 CHAPTER 4 Sampling Distributions 


aan 


Wi FIGURE 4.7 Probability function of discrete r.v. 


and 
Var( )=3 3 mera” a 
Az 


Because Y = nX, by the Central Limit noe Y has an approximate normal distribution with 
mean y = np and variance o* = np(1 — p). Because the calculation of the binomial probabilities 
is cumbersome for large sample sizes n, the normal approximation to the binomial distribution is 
widely used. A useful rule of thumb for use of the normal approximation to the binomial distribution 
is to make sure n is large enough if np > 5 and n(1 — p) > 5. Otherwise, the binomial distribution 
may be so asymmetric that the normal distribution may not provide a good approximation. Other 
rules, such as np > 10 and n(1 — p) > 10, ornp(1 — p) > 10, are also used in the literature. Because 
all of these rules are only approximations, for consistency’s sake we will use np > 5 andn(1 — p) >5 
to test for largeness of sample size in the normal approximation to the binomial distribution. If need 
arises, we could use the more stringent condition np(1 — p) > 10. 


Recall that discrete random variables take no values between integers, and their probabilities are 
concentrated at the integers as shown in Figure 4.7. However, the normal random variables have 
zero probability at these integers; they have nonzero probability only over intervals. Because we 
are approximating a discrete distribution with a continuous distribution, we need to introduce a 
correction factor for continuity which is explained below. 


CORRECTION FOR CONTINUITY FOR THE NORMAL APPROXIMATION TO THE BINOMIAL 
DISTRIBUTION 


(a) To approximate P(X < a) or P(X > a), the correction for continuity is (a + 0.5), that is, 


P(X <a)= P(z pe eee =") 

np(1 — p) 

and 
(a + 0.5) — np 

P(X >a)= P(z a) 

Ynp(l — p) 

(b) To approximate P(X > a) or P(X < a), the correction for continuity is (a — 0.5), that is, 

pix» a) =o(2 > @=05.=0) 

np(l — p) 
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f(x) 


i-1/2 i+1/2 


Wi FIGURE 4.8 Continuity correction for P(X = i). 


and 


Pw <a)=P(z < ow  ) 


Vnp(i — p) 


(c) To approximate P(a < X < b), treat ends of the intervals separately, calculating two distinct 
z-values according to steps (a) and (b), that is, 


pia <x <b) =p( S$? ee”), 


Ynp(1 — p) Sieags Ynp(1 — p) 


(d) Use the normal table to obtain the approximate probability of the binomial event. 


The shaded area in Figure 4.8 represents the continuity correction for P (X = i). 


EEE 
Example 4.4.2 
A study of parallel interchange ramps revealed that many drivers do not use the entire length of parallel 
lanes for acceleration, but seek, as soon as possible, a gap in the major stream of traffic to merge. At one 
site on Interstate Highway 75, 46% of drivers used less than one third of the lane length available before 
merging. Suppose we monitor the merging pattern of a random sample of 250 drivers at this site. 
(a) What is the probability that fewer than 120 of the drivers will use less than one third of the 
acceleration lane length before merging? 
(b) What is the probability that more than 225 of the drivers will use less than one third of the 
acceleration lane length before merging? 
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Solution 
First we check for adequacy of the sample size: 


np = (250)(0.46) = 115. and n(1 — p) = (250)(1 — 0.46) = 135. 


Both are greater than 5. Hence, we can use the normal approximation. Let X be the number of drivers using 
less than one third of the lane length available before merging. Then X can be considered to be a binomial 
random variable. Also, 


pf =np = (250)(0.46) = 115.0 


and 


o = J/npQ — p) = V250(0.46)(0.54) = 7.8804. 


119.5 — 115 . : 

(a) P(X < 120)=P|Z < ee = 0.57103 } = 0.7157, that is, we are approximately 71.57% 
certain that fewer than 120 drivers will use less than one third of the acceleration length before 
merging. 

225.5 — 115 

(b) P(X > 225)= (z == oeegd 14.02213) 0, that is, there is almost no chance that 

more than 225 drivers will use less than one third of the acceleration lane length before merging. 


EXERCISES 4.4 


4.4.1. Arandom sample size of 150 is taken from an infinite population having mean yw = 8 and 
variance o* = 4. What is the probability that X will be between 7.5 and 10? 


4.4.2. A machine that is used to fill bottles with soda has been observed to have a true standard 
deviation in the amounts of fill of approximately o = 1.25 ounces. However, the mean 
ounces of fill 4. may change from day to day, because of change of operator or adjustments 
in the machine. If n = 55 observations on ounces of fill are taken on a given day, find the 
probability that the sample mean will be within 0.5 ounce of the true population mean. 
State any assumptions. 


4.4.3. The times spent by customers coming to a certain gas station to fill up can be viewed as 
independent random variables with a mean of 3 minutes and a variance of 1.5 minutes. 
Approximate the probability that a random sample of 75 customers in this gas station will 
spend a total time less than 3 hours. Interpret your results and state any assumptions. 


4.4.4. Refer to Exercise 4.4.3. Find the number of customers, m, such that the probability that all 
the m customers can fill up in less than 3 hours is approximately 0.2. 


4.4.5. Inthe mathematics department ofa certain university, in a particular semester, 1250 students 
took the elementary algebra final examination. The mean was 69% with a standard deviation 
of 5.4%. If a random sample of 60 students is selected from this population, what is the 
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probability that the average score of this sample will be at most 75.08? Interpret your results 
and state any assumptions. 


4.4.6. For a newborn full-term infant, the weight appropriate for gestational age is assumed to be 
normally distributed with 4 = 3025 grams and o = 165 grams. Compute the probability 
that a random sample of 50 infants born at full term results in a sample mean of less than 


3500 grams. 
4.4.7. Let X1,..., Xn be a random sample, each with mean jw, and standard deviation oj. 
Also, let Y1, Y2,..., Ym, be a random sample, each with mean j2 and a standard devi- 


ation o7. Assume that both the samples are from normal populations. Verify that 
(B—¥) ~N (1 — 2, 409 + 303), 


4.4.8. Let X;,...,X, be arandom sample, each with mean jz; and standard deviation oj. Also, 
let ¥;, Y2,..., Y, be arandom sample independent of X,,..., X,, each with mean zz and 
a standard deviation o>. Prove that the random variable 


(X — Y) — Gi — p12) 


Van = 
/ o? +o4 
n 


satisfies the conditions of Theorem 4.4.1 and hence V,, is asymptotically normal. 


4.4.9. Suppose X is a binomial random variable with n = 20 and p = 0.2. Find the probability 
that X < 10 using binomial tables and compare this to the corresponding value found from 
normal approximation. 


4.4.10. Using normal approximation, find the probability of obtaining 90 heads in 150 tosses of a 
fair coin. Is the normal approximation valid? Why? 


4.4.11. A car rental company finds that each day 6% of the persons making reservations will not 
show up. If the rental company reserves for 215 persons with only 200 automobiles, what is 
the probability that an automobile will be available for every person who shows up holding 
a reservation? (Use the normal approximation.) 


4.4.12. The president of the United States is thought to have a positive approval rating of 58% of 
the people at a certain time. In a random sample of 1200 people, what is the approximate 
probability that the number of positive approvals will be at least 750? Interpret your results 
and state any assumptions. 


4.4.13. In the United States, sudden infant death syndrome (SIDS) is one of the leading causes of 
postneonatal deaths (those occurring between the ages of 28 days and 1 year). Thus far, the 
most significant risk factor discovered for SIDS is placing babies to sleep in a prone position 
(on their stomachs). Suppose the rate of death due to SIDS is 0.00103 per year. In a random 
sample of 5000 infants between the ages of 28 days and 1 year, what is the approximate 
probability that the number of SIDS-related deaths will be at least 10? Interpret your results 
and state any assumptions. 
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4.4.14. Let X and Y be independent binomial random variables with parameters (n, p1) and (m, p2), 


respectively. 
xX Y 
(a) Find E (= - *). 
n n 
(b) Find Var (= — *). 
n n 


xX YY xX YY xX Y 
(c) Show that (= — ) ~n(E 6 — *) , Var & — *)) for large n. 
n n n n n n 


4.5 CHAPTER SUMMARY 


In this chapter, we learned about sampling distributions. In sampling distributions associated with 
normal populations, we have seen that we can generate chi-square, t-, and F-distributions. In 
Section 4.3 we dealt with order statistics. Then in Section 4.4 we looked at large sample approxi- 
mations such as the normal approximation to the binomial distribution. In the following section, 
we will give Minitab examples to show how the idea of sampling distribution can be explored using 
statistical software. 


We will now list some of the key definitions introduced in this chapter. 


= Sampling distribution 

Sample and sample size 

Random sample 

Statistic 

Standard error 

Finite population correction factor 
Degrees of freedom 

a f-distribution 

a F-distribution 

m Order statistics 


In this chapter, we have also presented the following important concepts and procedures: 


= Sampling distribution associated with normal distribution 

a Results on chi-square distribution 

= Results on Student t-distribution 

Results on F-distribution 

Derivation of probability density functions for order statistics 
Large sample approximations 

Normal approximation to the binomial 


Correction for continuity for the normal approximation to the binomial distribution 
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4.6 COMPUTER EXAMPLES 
4.6.1 Minitab Examples 


OOOO —_:——nn nn — Eo 
Example 4.6.1 
Create three samples of size 30 from standard normal distribution using Minitab, and draw histograms for 
each sample. 


Solution 
We can use the following procedure: 
1. Open a new worksheet. 
Choose Calc > Random Data > Normal. 
Generate 30 rows of data. 
Store results in C1-C3. 
Enter a mean of 0 and a standard deviation of 1 and click OK. 
Choose Graph > Character Graphs > Histogram and enter C1-C3 in the variable box and click OK. 
We will not give the data or any of the three histograms that we will get. These histograms are just 


aM Rw 


lines containing *’s. If we need actual histograms, in step 6 use 
Graph > Histogram and enter C7 in the graph variable box and click OK 


If we wish to generate descriptive statistics, then 
7. Choose Stat > Basic Statistics > Display Descriptive statistics. .., enter C1-C3 in the variable box, 
and click OK. 
If we would like to see the mean for the three samples, 
8. Choose Calc > Row Statistics, then click Mean and in the Input variables type C1-C3. In Store Result 
in: C4 and Click OK. 
To see the histogram of these averages, follow step 6 with C4 in the graph variable box. 
Using a similar procedure, one could generate samples from normal distributions with different means 
and standard deviations, as well as from other distributions. 
| 


4.6.2 SPSS Examples 


If we have the full version of SPSS, we can write code that can be used to simulate a sampling 
distribution with different values of p. However, with the student version, it is not easy to simulate. 
Therefore, we will not give SPSS examples in this chapter. 


4.6.3 SAS Examples 


—_—_—_—_—_—_—_———_———_—_—— 
Example 4.6.2 
Generate 50,000 observations from a normal distribution with mean 30 and standard deviation 8. Obtain 
summary statistics for these data and draw a graph. 
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Solution 
We could use the following program. 


title ’50000 Obs Sample froma Normal Distribution’; 
title2 ’with Mean=30 and Standard Deviation=8’ ; 


data normaldat; 


do n=1 to 50000; 
X=8*rannor(55)+30; 
output; 

end; 

run; 


proc univariate data=normaldat; 
var X; 
run; 


proc chart; 

vbar x / midpoints=6 to 54 by 2; 
format x msd.; 

run; 


In the foregoing program, rannor(55), the number 55 is just a seed number to obtain the same series of 
random numbers each time we run the program. If we use ‘0’, each time we run the program we will get a 
different set of random numbers. We will not give the output. 

= 


es 


Example 4.6.3 
From an exponential distribution, draw 10,000 samples, each sample of size 15. Compute the mean of each 
sample and draw a chart for the means. This will be an approximate sampling distribution of X for a fixed 
sample of size 15. 


Solution 
Use the following program. 


title ’10000 Sample Means with 15 Obs per Sample’; 
title2 ’Drawn from an Exponential Distribution’; 


data samplel5; 
do Sample=1 to 10000; 
do n=1 to 15; 
X=ranexp(3); 
output; 
end; 
end; 
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proc means data=sample 15 noprint; 
output out=mean 15 mean=Mean; 


var X; 
by sample; 
run; 


proc chart data=mean 15; 

vbar mean/axis=1800 

midpoints=0.10 to 2.05 by .1; 

run; 

proc univariate data=mean4 noextrobs=0 normal 
mu0=1; 
var mean; 
run; 


This will produce an approximate sampling distribution of X. We will not give the output. 


PROJECTS FOR CHAPTER 4 
4A. A Method to Obtain Random Samples from Different Distributions 


Most of the statistical software packages contain a random number generator that produces approx- 
imations to random numbers from the uniform distribution U [0, 1]. To simulate the observation 
of any other continuous random variables, we can start with uniform random numbers and asso- 
ciate these to the distribution we want to simulate. For example, suppose we wish to simulate an 
observation from the exponential distribution 


F(x) =1-e°9*, Q<x<00. 
First produce the value of y from the uniform distribution. Then solve for x from the equation 
y= F(x) =1-e°*, 


Sox= [—In(1 — y)] /0.5 is the corresponding value of the exponential random variable. For instance, 
ify =0.67, thenx = [—In (1 — y)] /0.5 = 2.2173. If we wish to simulate a sample from the distribution 
F from the different values of y obtained from the uniform distribution, the procedure is repeated 
for each new observation x. 


(a) Simulate 10 observations of a random variable having exponential distribution with mean 
and standard deviation both equal to 2. 

(b) Select 1500 random samples of size n = 10 measurements from a population with an expo- 
nential distribution with mean and standard deviation both equal to 2. Calculate sample 
mean for each of these 1500 samples and draw a relative frequency histogram. Based on 
Theorems 4.1.1 and 4.4.1, what can you conclude? 

It should be noted that in general, if Y ~ U (0, 1) random variable, then we can show that X = — ink 
will give an exponential random variable with parameter A. Uniform random variables could also 
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be used to generate random variables from other distributions. For example, let Ujs be iid U [0, 1] 
random variables. Then, 


v 
X=-2) In) ~ x4,, 
i=1 
and 


Qa 
Y=—f)"In(Uj) ~ Gamma (@, f). 
i=1 
Of course, these transformations are useful only when v and a are integers. More efficient methods, 
such as MCMC methods, are discussed in Chapter 13. 


4B. Simulation Experiments 


When the derivation via probability rules is too difficult or complicated to be carried out, one can use 
simulation experiments to obtain information about a statistic’s sampling distribution. The following 
characteristics of the experiment must be specified: 


(i) The population distribution (normal with ~ = 10 and o = 2, exponential with A = 5, etc.) 
(ii) The sample size n and the statistic of interest (X, S, etc.) 
(iii) The number of replications k (such as k = 300) 


Then, using a computer program, obtain k different random samples, each of size n, from the des- 
ignated population distribution. Calculate the value of the statistic for each of the k replications. 
Construct a histogram for this k statistic. This histogram gives the approximate sampling distribution 
of the statistic. The larger the value of k, the better will be the approximation. 
(a) For your simulation study, use the population distribution as normal with w = 3.4 and 
o = 1.2. 
Forn = 8 perform k = 500 replications and draw a histogram for values of the sample means. 
Repeat the experiment with n = 15,n = 25, andn = 35 and draw the histograms. Based on 
this exercise, you will be able to intuitively verify the result that X based on a large n tends to 
be closer to than does X based ona small n. 
(b) Repeat the experiment of part (a) with different values of k, such as k = 200, k = 750, and 
k = 1000. 
(c) Repeat the simulation study with different distributions such as exponential distribution. 


4c. A Test for Normality 


Many statistical procedures require that the population be at least approximately normal. Therefore, a 
procedure is needed for checking that the sampled data could have come from a normal distribution. 
There are many procedures, such as the normal-score plot, or Lilliefors test for normality, available 
in statistics for this purpose. We will describe the normal-score plot, which is an effective way to detect 
deviations from normality. The normal scores consist of values of z that divide the axes into equal 
probability intervals. For a sample of size 4, the normal scores are —zo.29 = —0.84, —Zo.490 = —0.25, 
Z0.40 = —0.25, and Z0.20 = 0.84. 
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STEPS TO CONSTRUCT A NORMAL PLOT 

1. Rearrange the n data points in ascending order. 

2. Obtain the n normal scores. 

3. Plot the kth largest observation, versus the kth normal score, for all k. 

4. If the data were from a standard normal distribution, the plot would resemble a 45 degree line 
through the origin. 

5. If the observations were from normal (but not from standard normal), the pattern should still be a 
straight line. However, the line need not pass through the origin or have a slope 1. 


In applications, a minimum of 15 to 20 observations is needed to reach a more accurate conclusion. 


EXERCISES 
1. For different observations, construct normal plots and check for normality of the corresponding 
populations. 


2. Using software (such as Minitab), generate 15 observations each from the following distributions: 
(a) Normal (2, 4), (b) Uniform (0, 1), (c) Gamma (2, 4), and (d) Exponential (2). 
For each of these data sets, draw a probability plot and note the geometry of the plots. 
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Point Estimation 


Objective: In this chapter we study some statistical methods to find point estimators of population 
parameters and study their properties. 
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C. R. Rao 
(Source: http:www.science.psu.edu/alert/Rao6-2007.htm) 


Calyampudi Radhakrishna (C. R.) Rao (1920-) is a contemporaty statistician whose work has influ- 
enced not just statistics, but such diverse fields as anthropology, biometry, demography, economics, 
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genetics, geology, and medicine. Several statistical terms and equations are named after Rao. He has 
worked with many other famous statisticians such as Blackwell, Fisher, and Neyman and has had 
dozens of theorems named after him. Rao earned an M.A. in mathematics and another M.A. in statis- 
tics, both in India, and earned his Ph.D. and Sc.D. at Cambridge University. The following was stated 
in the Preface to the 1991 special issue of the Journal of Quantitative Economics in Rao’s honor: “Dr. Rao 
is a very distinguished scientist and a highly eminent statistician of our time. His contributions to 
statistical theory and applications are well known, and many of his results, which bear his name, 
are included in the curriculum of courses in statistics at bachelor’s and master’s level all over the 
world. He is an inspiring teacher and has guided the research work of numerous students in all areas 
of statistics. His early work had greatly influenced the course of statistical research during the last 
four decades. One of the purposes of this special issue is to recognize Dr. Rao’s own contributions to 
econometrics and acknowledge his major role in the development of econometric research in India.” 
The importance of statistics can be summarized in Rao’s own words: “If there is a problem to be 
solved, seek statistical advice instead of appointing a committee of experts. Statistics can throw more 
light than the collective wisdom of the articulate few.” 


5.1 INTRODUCTION 


In statistical analysis, point estimation of population parameters plays a very significant role. In 
studying a real-world phenomenon we begin with a random sample of size n taken from the totality 
of a population. The initial step in statistically analyzing these data is to be able to identify the 
probability distribution that characterizes this information. Because the parameters of a distribution 
are its defining characteristics, it becomes necessary to know the parameters. In the present chapter, 
we assume that the form of the population distribution is known (binomial, normal, etc.) but the 
parameters of the distribution (p for a binomial; w and o? for a normal, etc.) are unknown. We shall 
estimate these parameters using the data from our random sample. It is extremely important to have 
the best possible estimate of the population parameter(s). Having such estimates will lead to a better 
and more accurate statistical analysis. 


For example, in the area of phosphate mining in Florida, we may be interested in estimating the 
average radioactivity from both uranium and radium in a clay settling area of a mining site. Suppose 
that a random sample of 10 such sites resulted in a sample average of 40 pCi/g (picocuries/gram) of 
radioactivity. We may use this value as an estimate of the average radioactivity for all of the settling 
areas of mining sites in Florida. Because many Florida crops are grown on clay settling areas, this type 
of estimate is important for accessing the radioactivity-associated risks that are due to eating food 
from the crops grown on these clay settling areas. 


We will now introduce some of the more useful statistical point estimation methods, discuss their 
properties, and illustrate their usefulness with a number of applications. The importance of point 
estimates lies in the fact that many statistical formulas are based on them. For example, the point 
estimates of mean and standard deviation are used in the calculation of confidence intervals and 
in many formulas for hypothesis testing. These topics are covered in subsequent chapters. Also, in 
most applied problems, a certain numerical characteristic of the physical phenomenon may be of 
interest; however, its value may not be observable directly. Instead, suppose it is possible to observe 
one or more random variables, the distribution of which depends on the characteristic of interest. Our 
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objective will be to develop methods that use the observed values of random variables (sample data) 
in order to gain information about the unknown and unobservable characteristic of the population. 


Let X1,..., X» be independent and identically distributed (iid) random variables (in statistical lan- 
guage, arandom sample) with a pdf or pf f(x, 61, ... 0), where 61, ..., @ are the unknown population 
parameters (characteristics of interest). For example, a normal pdf has parameters jz (the mean) and o7 
(the variance). The actual values of these parameters are not known. The problem in point estimation 
is to determine statistics g;(X1,..., Xn), i= 1,...,/1, which can be used to estimate the value of each 
of the parameters—that is, to assign an appropriate value for the parameters 6 = (6),..., 6;) based 
on observed sample data from the population. These statistics are called estimators for the parameters, 
and the values calculated from these statistics using particular sample data values are called estimates 
of the parameters. Estimators of 6; are denoted by 6;, where 6; = gi(X1,..., Xn), i=1,..., 1. Observe 
that the estimators are random variables. As a result, an estimator has a distribution (which we called 
the sampling distribution in Chapter 4). When we actually run the experiment and observe the data, 
let the observed values of the random variables be X,..., X, be x1,..., Xn; then, 0(X1,..., X,) isan 
estimator, and its value 6(x1, ...,Xn) is an estimate. For example, in case of the normal distribution, 
the parameters of interest are 6; = w, and 62 = o%, that is, 0 = (u, 07). If the estimators of 4 and 
o* are X = (1/n) )“"_, X; and S* = (1/n— 1) )'_, (X; — X)* respectively, then, the corresponding 
estimates are ¥ = (1/n) )v_, x; and s* = (1/n—1) )°_, (x; — x)”, the mean and variance corre- 
sponding to the particular observed sample values. In this book, we use capital letters such as X and 
S* to represent the estimators, and lowercase letters such as ¥ and s* to represent the estimates. 


There are many methods available for estimating the true value(s) of the parameter(s) of interest. 
Three of the more popular methods of estimation are the method of moments, the method of max- 
imum likelihood, and Bayes’ method. A very popular procedure among econometricians to find a 
point estimator is the generalized method of moments. In this chapter we study only the method of 
moments and the method of maximum likelihood for obtaining point estimators and some of their 
desirable properties. In Chapter 11, we shall discuss Bayes’ method of estimation. 


There are many criteria for choosing a desired point estimator. Heuristically, some of them can 
be explained as follows (detailed coverage is given in Sections 5.2 through 5.5). An estimator, 6, 
is unbiased if the mean of its sampling distribution is the parameter 0. The bias of @ is given by 
B = E(@) — 0. The estimator satisfies the consistency property if the sample estimator has a high 
probability of being close to the population value 6 for a large sample size. The concept of efficiency 
is based on comparing variances of the different unbiased estimators. If there are two unbiased 
estimators, it is desirable to have the one with the smaller variance. The estimator has the sufficiency 
property if it fully uses all the sample information. Minimal sufficient statistics are those that are 
sufficient for the parameter and are functions of every other set of sufficient statistics for those same 
parameters. A method due to Lehmann and Scheffé can be used to find a minimal sufficient statistic. 


5.2 THE METHOD OF MOMENTS 


How do we find a good estimator with desirable properties? One of the oldest methods for finding 
point estimators is the method of moments. This is a very simple procedure for finding an estimator 
for one or more population parameters. Let yi, = E [X*] be the kth moment about the origin of a 
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random variable X, whenever it exists. Let m), = (1/n) )77_, x be the corresponding kth sample 
moment. Then, the estimator of jz, by the method of moments is m!,. The method of moments is 
based on matching the sample moments with the corresponding population (distribution) moments 
and is founded on the assumption that sample moments should provide good estimates of the cor- 
responding population moments. Because the population moments uj, = hy (O1,02,...,6)) are 
often functions of the population parameters, we can equate corresponding population and sample 
moments and solve for these parameters in terms of the moments. 


METHOD OF MOMENTS 
Choose as estimates those values of the population parameters that are solutions of the equations 
by, = m,,k = 1,2, ...,1. Here ju, is a function of the population parameters. 


For example, the first population moment is x, = E(X), and the first sample moment is X = 
>, Xi/n. Hence, the moment estimator of ju, is X. If k = 2, then the second population and 
sample moments are jw’) = E(X) and m’, = (1/n) )-7_, X?, respectively. Basically, we can use the 
following procedure in finding point estimators of the population parameters using the method of 
moments. 


THE METHOD OF MOMENTS PROCEDURE 


Suppose there are/ parameters to be estimated, say 0 = (6), ...,6)). 
1. Find / population moments, ,,k = 1,2, ...,/. u), will contain one or more parameters 41, .. ., 6). 
2. Find the corresponding / sample moments, Mm. k = 1,2, ...,/. The number of sample moments 
should equal the number of parameters to be estimated. 
3. From the system of equations, 1, = mk = 1,2, ...,1, solve for the parameter 6 = (6), ...,6)); 


this will be a moment estimator of 0. 


The following examples illustrate the method of moments for population parameter estimation. 


OOo —_:———nn nn — — eES—cceeeeeeeeeeee 
Example 5.2.1 
Let X1,..., X, be arandom sample from a Bernoulli population with parameter p. 
(a) Find the moment estimator for p. 
(b) Tossing a coin 10 times and equating heads to value 1 and tails to value 0, we obtained the 
following values: 


01103101 1 1 ~0 


Obtain a moment estimate for p, the probability of success (head). 
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Solution 
(a) For the Bernoulli random variable, Ly = E[X] = p, so we can use m‘, to estimate p. Thus, 


Let 


Then, the method of moments estimator for p is p = Y/n. That is, the ratio of the total number of 
heads to the total number of tosses will be an estimate of the probability of success. 

(b) Note that this experiment results in Bernoulli random variables. Thus, using part (a) with Y = 6, we 
get the moment estimate of p is p= 4 = 0.6. 
We would use this value p = 0.6, to answer any probabilistic questions for the given problem. For 
example, what is the probability of exactly obtaining 8 heads out of 10 tosses of this coin? This can be 
obtained by using the binomial formula, with p = 0.6, that is, P(X = 8) = @ (0.6)8(0.4) 10-8, 


In Example 5.2.1, we used the method of moments to find a single parameter. We demonstrate in 
Example 5.2.2 how this method is used for estimating more than one parameter. 


oor, 


Example 5.2.2 
Let X1,..., X, be arandom sample from a gamma probability distribution with parameters a and £. Find 
moment estimators for the unknown parameters @ and . 


Solution 
For the gamma distribution (see Section 3.2.5), 


E[X] =a and E[x?] = of? + 02B?. 


Because there are two parameters, we need to find the first two moment estimators. Equating sample 
moments to distribution (theoretical) moments, we have 


i = 1 
-)° Xx; = X=af, and —)° x? = of? + a p?. 
er n 4 


Solving for a and B we obtain the estimates as a = (x/B) and B= [{(1/n) Dahan < - ¥°}/x]. 
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Therefore, the method of moments estimators for «a and B are 


. xX 
a=>->z 
B 
and 
mh 2 mh 2 
Le xp-¥ Y(x-%) 
B = = — _ — 
P nX 
which implies that 
2. ae xe r 
a= r= — 
B eee 2 ue 2 
LyxP-% FH) 


Thus, we can use these values in the gamma pdf to answer questions concerning the probabilistic behavior 


of the rv. X. 
|| 


ooo, 


Example 5.2.3 
Let the distribution of X be N(j1, 07). 
(a) Foragiven sample of size n, use the method of moments to estimate jz and o. 
(b) The following data (rounded to the third decimal digit) were generated using Minitab from a 
normal distribution with mean 2 and a standard deviation of 1.5. 


3.163 1.883 3.252 3.716 —0.049 —0.653 0.057 2.987 
4.098 1.670 1.396 2.332 1.838 3.024 2.706 0.231 
3.830 3.349 —0.230 1.496 


Obtain the method of moments estimates of the true mean and the true variance. 


Solution 
(a) For the normal distribution, E(X) = 4, and because Var(X) = EX? — pe, we have the second 


moment as E(X?) =o7+ 2. 
Equating sample moments to distribution moments we have 


and 
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Solving for 4 and o%, we obtain the moment estimators as 
=X 


and 


Te 2 ee DB) 
= ee x= —) (x ie 
i=1 i=1 


(b) Because we know that the estimator of the mean is ju = X and the estimator of the variance is 6* = 


(1/n) 74 ox, from the data the estimates are ju = 2.005, and 6? = 6.12—(2.005)? = 2.1. 
Notice that the true mean is 2 and the true variance is 2.25, which we used to simulate the data. 
|_| 


In general, using the population pdf we evaluate the lower order moments, finding expressions for the 
moments in terms of the corresponding parameters. Once we have population (theoretical) moments, 
we equate them to the corresponding sample moments to obtain the moment estimators. 


SS —————————————————————————————————————————————— 
Example 5.2.4 
Let X1,..., X» be a random sample from a uniform distribution on the interval [a, b]. Obtain method of 
moment estimators for a and b. 


Solution 

Here, a and b are treated as parameters. That is, we only know that the sample comes from a uniform 
distribution on some interval, but we do not know from which interval. Our interest is to estimate this 
interval. 

The pdf of a uniform distribution is 


(Sin 


0, otherwise. 


Hence, the first two population moments are 
b 


m= £00 = f 5 


a 


b 
x he? 
a 


a2 +ab+b2 


2 2 
y) x 
and wa = £0?) = | a= : 
a 


The corresponding sample moments are 
fiy=X and fo= 1s x2 
1 = M2>= A ra 
Equating the first two sample moments to the corresponding population moments, we have 


a+b a +ab+b? 


Ay = 
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which, solving for a and b, results in the moment estimators of a and b, 


@ = fy —f3(f2—-24) and b= fi +,/3 (2 — 2X4). 


In Example 5.2.4, if a = —b, that is, X;,..., X;, isa random sample from a uniform distribution on 
the interval (—b, b), the problem reduces to a one-parameter estimation problem. However, in this 
case E(X;) = 0, so the first moment cannot be used to estimate b. It becomes necessary to use the 
second moment. For the derivation, see Exercise 5.2.3. 


It is important to observe that the method of moments estimators need not be unique. The following 
is an example of the nonuniqueness of moment estimators. 


Example 5.2.5 
Let X1,..., Xn be a random sample from a Poisson distribution with parameter 4 > 0. Show that both 
(1/n) )_, Xj and (1/n) YY, X? — ((1/n) PL, x)" are moment estimators of 4. 


Solution 
We know that E(X) = 4, from which we have a moment estimator of A as (1/n) >7"_, Xj. Also, because 


we have Var(X) = 4, equating the second moments, we can see that 


A = E(X?) — (EX), 


so that 
2 
eo ee Fe 
he = (; byes 
i=l i=1 
Thus, 
1 n 
A= X; 
n 
i=1 
and 


Both are moment estimators of 2. Thus, the moment estimators may not be unique. We generally choose X 
as an estimator of i, for its simplicity. 
= 
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It is important to note that, in general, we have as many moment conditions as the parameters. 
In Example 5.2.5, we have more moment conditions than parameters, because both the mean and 
variance of Poisson random variables are the same. Given a sample, this results in two different 
estimates of a single parameter. One of the questions could be, can these two estimators be combined 
in some optimal way? This is done by the so-called generalized method of moments (GMM). We will 
not deal with this topic. 


As we have seen, the method of moments finds estimators of unknown parameters by equating the 
corresponding sample and population moments. This method often provides estimators when other 
methods fail to do so or when estimators are harder to obtain, as in the case of a gamma distribution. 
Compared to other methods, method of moments estimators are easy to compute and have some 
desirable properties that we will discuss in ensuing sections. The drawback is that they are usually 
not the “best estimators” (to be defined later) available and sometimes may even be meaningless. 


EXERCISES 5.2 


5.2.1. Let X1,...,X, be arandom sample of size n from the geometric distribution for which p 
is the probability of success. 


(a) Use the method of moments to find a point estimator for p. 
(b) Use the following data (simulated from geometric distribution) to find the moment 
estimator for p: 


2 5 7 43 18 19 16 11 22 
4 34 19 21 23 6 21 7 #12 


How will you use this information? [The pdf of a geometric distribution is f(x) = 
pd — p)7!, forx =1,2,.... Also uw = 1/p.] 


5.2.2. Let X;,...,X, be arandom sample of size n from the exponential distribution whose pdf 
(by taking 0 = 1/8 in Definition 2.3.7) is 


be8*, x > 0 
f(x, = 
0, x <0. 


(a) Use the method of moments to find a point estimator for 6. 
(b) The following data represent the time intervals between the emissions of beta particles. 


09 01 O01 08 O09 O11 O11 0.7 10 0.2 
0.1 01 O01 23 08 03 02 01 10 09 
0.1 05 04 06 02 04 0.2 01 08 0.2 
05 30 10 05 02 2.0 1.7 01 03 O.1 
04 05 08 01 O1 1.7 0.1 02 03 O.1 
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5.2.3. 


5.2.4. 


5.2.5. 


5.2.6. 


5.2.7. 


5.2.8. 


Assuming the data follow an exponential distribution, obtain a moment estimate for the 
parameter 6. Interpret. 


Let X1,...,X, be a random sample from a uniform distribution on the interval 
(@—1,0+1). 

(a) Find a moment estimator for 0. 

(b) Use the following data to obtain a moment estimate for 0: 


11.72 12.81 12.09 13.47 12.37 


The probability density of a one-parameter Weibull distribution is given by 


es 
2axe% ~” x>0 


F@) = 0, otherwise. 


(a) Using a random sample of size n, obtain a moment estimator for a. 
(b) Assuming that the following data are from a one-parameter Weibull population, 


1.87 1.60 2.36 1.12 0.15 
1.83 0.64 1.53 0.73 2.26 


obtain a moment estimate of a. 


Let X;,..., X, be arandom sample from the truncated exponential distribution with pdf 
—(x—-6) 
e ; x>0 
fQ) = ‘ 
0, otherwise. 


Find the method of moments estimate of 0. 


Let X;,..., X, be arandom sample from a distribution with pdf 
1 
f(x, a) = _ —1<x<1, and —1<a<1l. 


Find the moment estimators for a. 


Let X;,..., X, be arandom sample from a population with pdf 
2a x>a 
13° ~ 


0, otherwise. 


Find a method of moments estimator for a. 


Let X;,..., X, be arandom sample from a negative binomial distribution with pmf 


—1 
p(x,r, p) = (“" 1 )pra- pops 1,x=0,1,2,.... 
pe 
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Find method of moments estimators for r and p. [Here E[X] = r(1 — p)/p and E[X*] = 
r(1— p)(r — rp + 1)/p?.] 


5.2.9. Let X1,..., X, be arandom sample from a distribution with pdf 


(6+1)x9, O0<x<1)0>-1 
f@= 


0, otherwise. 


Use the method of moments to obtain an estimator of 6. 
5.2.10. Let X;,..., X, bea random sample from a distribution with pdf 
2B-—2x 


roo =| pe’ 


0, otherwise. 


O0<x<f8 


Use the method of moments to obtain an estimator of B. 


5.2.11. Let X1,..., X, bearandom sample with common mean pz and variance o?. Obtaina method 
of moments estimator for o. 


5.2.12. Let X1,...,X, be a random sample from the beta distribution with parameters a and 8. 
Find the method of moments estimator for a and f. 


5.2.13. Let X1, X2,...,Xn be a random sample from a distribution with unknown mean jz and 
variance o?. Show that the method of moments estimators for jz and o? are, respectively, the 
sample mean X and S’ =(1/n) )~_, (X — X)*. Note that 8’? = [(n — 1)/n] S? where S? is 
the sample variance. 


5.3 THE METHOD OF MAXIMUM LIKELIHOOD 


It is highly desirable to have a method that is generally applicable to the construction of statistical 
estimators that have “good” properties. In this section we present an important method for finding 
estimators of parameters proposed by geneticist/statistician Sir Ronald A. Fisher around 1922 called 
the method of maximum likelihood. Even though the method of moments is intuitive and easy to 
apply, it usually does not yield “good” estimators. The method of maximum likelihood is intuitively 
appealing, because we attempt to find the values of the true parameters that would have most likely 
produced the data that we in fact observed. For most cases of practical interest, the performance of 
maximum likelihood estimators is optimal for large enough data. This is one of the most versatile 
methods for fitting parametric statistical models to data. First, we define the concept of a likelihood 
function. 


Definition 5.3.1 Let f(x1,...,xn39),0 € © C R‘, be the joint probability (or density) function of n 
random variables X1,..., Xn with sample values x1,...,Xn. The likelihood function of the sample is 
given by 


L(6;x1,...,%n) = f(1,...,%ni9), [= L(), in a briefer notation]. 


We emphasize that L is a function of 6 for fixed sample values. 
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If X1,..., Xn are discrete iid random variables with probability function p(x, 6), then, the likelihood 
function is given by 


L(@) = P(X, =%1,...,Xn = Xn) 


n 
— I] P(X; =xj), (by multiplication rule for independent 
i=1 random variables) 


n 
=[[ poi.) 
i=1 
and in the continuous case, if the density is f(x, 6), then the likelihood function is 
n 
L@ =|] fi). 
i=1 


It is important to note that the likelihood function, although it depends on the observed sample values 


x = (x1,...,X»), is to be regarded as a function of the parameter 6. In the discrete case, L(@; x1, ..., Xn) 
gives the probability of observing x = (11,..., Xn), for a given @. Thus, the likelihood function is a 
statistic, depending on the observed sample x = (%1,..., Xn). 


(5—$ $A 
Example 5.3.1 


Let X1,..., Xp be iid N(w, 02) random variables. Let x1, ..., Xn be the sample values. Find the likelihood 
function. 

Solution ; 

The density function for the normal variable is given by f(x) = ape exp ( iq2): Hence, the likelihood 
function is 


n 3 (xi — a 
ee : 
L (u.07) = | | z on ( ew) : exp = 


= (Qr)?/2qn 202 


A statistical procedure should be consistent with the assumption that the best explanation of a set 
of data is provided by an estimator 6, which will be the value of the parameter 6 that maximizes the 
likelihood function. This value of @ will be called the maximum likelihood estimator. The goal of 
maximum likelihood estimation is to find the parameter value(s) that makes the observed data most 
likely. 


Definition 5.3.2 The maximum likelihood estimators (MLEs) are those values of the parameters that 
maximize the likelihood function with respect to the parameter 6. That is, 


Le (Grttsessita) = max L (0; x1,..., Xn) 
dEO 


where © is the set of possible values of the parameter 0. 
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The method of maximum likelihood extends to the case of several parameters. Let X1,..., Xn bea 
random sample with joint pmf (if discrete) or pdf (if continuous) 


L(O1,.--; Omi %X15-++,Xn) = f(%1,%2,...,%4n3 91, 92,..-, 9m) 


where the values of the parameters 61, ..., 9, are unknown and x1, ..., %» are the observed sample 
values. Then, the maximum likelihood estimates 61, ..., 6m are those values of the 6s that maximize 
the likelihood function, so that 


f(x1,--+5%n781,---,8m) > f(X1,---,4n791,---, Om) 
for all allowable 6), ..., @n- 


Note that the likelihood function conveys to us how feasible the observed sample is as a function of 
the possible parameter values. Maximum likelihood estimates give the parameter values for which 
the observed sample is most likely to have been generated. In general, the maximum likelihood 
method results in the problem of maximizing a function of single or several variables. Hence, in most 
situations, the methods of calculus can be used. In deriving the MLEs, however, there are situations 
where the techniques developed are more problem specific. Sometimes we need to use numerical 
methods, such as Newton’s method. 


In order to find a MLE, we need only to compute the likelihood function and then maximize that 
function with respect to the parameter of interest. In many cases, it is easier to work with the natural 
logarithm (In) of the likelihood function, called the log-likelihood function. Because the natural log- 
arithm function is increasing, the maximum value of the likelihood function, if it exists, will occur 
at the same point as the maximum value of the log-likelihood function. We now summarize the 
calculus-based procedure to find MLEs. 


PROCEDURE TO FIND MLE 

Define the likelihood function, L(@). 

Often it is easier to take the natural logarithm (In) of L(6). 

When applicable, differentiate In L(6) with respect to 0, and then equate the derivative to zero. 
Solve for the parameter 6, and we will obtain 0. 

Check whether it is a maximizer or global maximizer. 


Mi ss ES IS) 


eee SSSgS—_0€—@—€—€—0—_—_€00— SFOS 
Example 5.3.2 


Suppose X1,..., X, are a random sample from a geometric distribution with parameter p,0 < p < 1. 
Find MLE p. 


Solution 
For the geometric distribution, the pmf is given by 


f(x, p) = pd— py !, O<pK<1, x=1,2,3,.... 
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Hence, the likelihood function is 


Z =e Xj 
Lo) =] [ra = py] =p"(l1—p) ‘=! 


i=1 
Taking the natural logarithm of L(p), 
n 
InL=nInp+ (-1+ 2») In(1 — p). 
i=1 


Taking the derivative with respect to p, we have 


dp p (1 — p) 
Equating an tty) to zero, we have 
n 
—n+ >> x; 
n i=1 = 
P (l= p) 
Solving for p, 
n 1 
P= ee 
x 
DXi 
i=1 
Thus, we obtain a maximum likelihood estimator of p as 
é. n 1 
Pp = ii = =. 
x 
L Xi 
i=1 


We remark that (1/X) is the maximum likelihood estimate of p. It can be shown that p is a global maximum. 
= 


EEO EEO 
Example 5.3.3 
Suppose X1,..., X, are random samples from a Poisson distribution with parameter A. Find MLE A. 


Solution 
We have the probability mass function 


re 4 
P(x) = + #=0,1,2,..., 420, 
x! 
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Hence, the likelihood function is 


n 
dex 
n Vi e—4 i=! "oon 
a 
° Xj: 
i=1 xj! 
i=] 


Then, taking the natural logarithm, we have 
n n 
InL(A) = hae. Ina —na— yon (x;!) 
i=1 i=1 


and differentiating with respect to A results in 


n 

Xi 
dinL(Qa) | j=1 
~ 


ah - 
and 
n 
Pe 
dish) 0, implies a 0 
—— =0, —n=0. 
dh ° 
That is, 
n 
i x 
a a: 
, 
Hence, the MLE of x is 
R=X. 


It can be verified that the second derivative is negative and, hence, we really have a maximum. 


Sometimes the method of derivatives cannot be used for finding the MLEs. For example, the likelihood 
is not differentiable in the range space. In this case, we need to make use of the special structures 
available in the specific situation to solve the problem. The following is one such case. 


eee SSS 
Example 5.3.4 
Let X1,..., Xn be arandom sample from U(0, 6), 6 > 0. Find the MLE of 6. 


Solution 
Note that the pdf of the uniform distribution is 


1 

—~, O<x<0@ 
f@)= 40 

0, otherwise. 
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Xn) 
W@ FIGURE5.1 Likelihood function for uniform probability distribution. 


Hence, the likelihood function is given by 


1 

=< .. (OS Kis Xoeo gin 8 
L (0,1, %2-+-,%n) = 4 OR? 7 ae 

0, otherwise. 


When 6 > max(x;), the likelihood is (1/6"), which is positive and decreasing as a function of 6 (for fixed n). 
However, for 8 < max(x;) the likelihood drops to 0, creating a discontinuity at the point max(x;) (this is the 
minimum value of @ that can be chosen which still satisfies the condition 0 < x; < 6), and Figure 5.1 shows 
that the maximum occurs at this point. Hence, we will not be able to find the derivative. Thus, the MLE is 
the largest order statistic, 


6 = max (X;) = Xn). 
= 


In the previous example, because E(X) = (6/2), we can see that 90 = 2E(X). Hence, the method 
of moments estimator for 6 is 6 = 2X. Sometimes the method of moments estimator can give 
meaningless results. To see this, suppose we observe values 3, 5, 6, and 18 from a U(0, 6) distribution. 
Clearly, the maximum likelihood estimate of 6 is 18, whereas the method of moments estimate is 
16, which is not quite acceptable, because we have already observed a value of 18. 


As mentioned earlier, if the unknown parameter 6 represents a vector of parameters, say 0 = 
(,..., 4), then the MLEs can be obtained from solutions of the system of equations 


te) 
gp DE 1, «+ +1n) = 0, for i=1,...,1. 


These are called the maximum likelihood equations and the solutions are denoted by (6), ..., 4). 


$$ SSSSFSFSSSSSSSSSSSSFSMseF 
Example 5.3.5 
Let X1,..., Xn be N (u, 07). 
(a) If wis unknown and o* = af is known, find the MLE for ju. 
(b) If 4 = pg is known and o? is unknown, find the MLE for o?. 
(c) If wand o? are both unknown, find the MLE for 6 = (uw, 07). 
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Solution 
In order to avoid notational confusion when taking the derivative, let 9 = a2. Then, the likelihood function 


is 


pa (xj — uw)? 
L (u, 8) = (270)! exp] — = 
or 
it 2 
InL (uv, 6) = —=InQz) — 2 Ine mi li 
, 2 20 


oe is known, the problem reduces to estimating the only one parameter, uw 


(a) When 6 = 6 = 
Differentiating the log-likelihood function with respect to 1, 


i=1 


‘ 2)0 Gi - ») 
an (In L (u, 60)) = 7a . 


Setting the derivative equal to zero and solving for ju, 


i — ph) =0. 
i=1 


From this, 


n 
Sox =n or w=xX. 
i=1 


Thus, we get jiu = X. 
(b) When 4 = wo is known, the problem reduces to estimating the only one parameter, ot = 0 


Differentiating the log-likelihood function with respect to 6, 


n 
(x; — 4)? 
dInL(u,@) =n oi ad 
a0 ~ 26 262 


Setting the derivative equal to zero and solving for 6, we get 


n 

d (X%} — uo)? 
ps i=1 
n 
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(c) When both yw and 6 are unknown, we need to differentiate with respect to both w and 6 individually: 


n 


2 Xp 
and we) eH 
Ope ~ 20 
and 
. 2 
anti. _—a 


06 20 202 


Setting the derivatives equal to zero and solving simultaneously, we obtain 


Note that in (a) and (c), the estimates for w are the same; however, in (b) and (c), the estimates for 


o2 are different. 


At times, the maximum likelihood estimators may be hard to calculate. It may be necessary to use 
numerical methods to approximate values of the estimate. The following example gives one such case. 


I 
Example 5.3.6 
Let X1,..., X, bea random sample from a population with gamma distribution and parameters @ and £. 
Find MLEs for the unknown parameters a and f. 


Solution 
The pdf for the gamma distribution is given by 


see we x>0, a>0, B>O0 


fx) = 


otherwise. 


The likelihood function is given by 


7 7 iT Pi tiesd - xb 
L= Lab) = Tape IL e 1 E 


Taking the logarithms gives 


n n 
x 
In L = —nInT (a) — naln B+ (a— 1) y In x; — y — 
i=1 im? 
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Now taking the partial derivatives with respect to a and B and setting both equal to zero, we have 


n 
a 
InL=-—n - ee ae 
I= 


a Lan 
InL= n= += =0. 


Solving the second one to get B in terms of a, we have 


p= 


Rl sl 


Substituting this B in the first equation, we have to solve 


ae n 
x 
ea 
i= 


for a > 0. There is no closed-form solution for a and B. In this case, one can use numerical methods such as 
the Newton--Raphson method to solve for a, and then use this value to find B. 
= 


There are many references available on the Web. Explaining the Newton-Raphson method, for 
instance, http://web.as.uky.edu/statistics/users/viele/sta601s08/nummax.pdf gives the algorithm for 
the gamma distribution. 


In only a few cases are we able to obtain a simple form for the maximum likelihood equation that 
can be solved by setting the first derivative to zero. Often we cannot write an equation that can be 
differentiated to find the MLE parameter estimates. This is especially true in the situation where the 
model is complex and involves many parameters. Evaluating the likelihood exhaustively for all values 
of the parameters becomes almost impossible, even with modern computers. This is why so-called 
optimization algorithms have become indispensable to statisticians. The purpose of an optimization 
algorithm is to find as fast as possible the set of parameter values that make the observed data most 
likely. There are many such algorithms available. We describe the Newton-Raphson method in Project 
5F, and another powerful algorithm, known as the EM algorithm, is given in Section 13.4. 


Sometimes, it may be necessary to estimate a function of a parameter. The following invariance 
property of maximum likelihood estimators is very useful in those cases. 


Theorem 5.3.1 Let h(6) be a one-to-one function of 0. If 6 = (6;,...,)) is the MLE of 0 = (01,..., 8), 
then the MLE of a function h(@) = (h1(0), ..., 4g (@)) of these parameters is h(@) = (hi (6), ..., Ae (0)) for 
1<k<l. 


As a consequence of the invariance property, in Example 5.3.5, we can obtain the estimator of the 


true standard deviation as 6 = V6? = Jain) ¥ 7 =X): 
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It is also known that, under very general conditions on the joint distribution of the sample and for 
a large sample size n, the MLE @ is approximately the minimum variance unbiased estimator (this 
concept is introduced in the next section) of 0. 


EXERCISES 5.3 


5.3.1. 


5.3.2. 


5.3.3. 


5.3.4. 


5.3.5. 


5.3.6. 


5.3.7. 


Let X1,..., X, bearandom sample recorded as heads or tails resulting from tossing a coin n 
times with unknown probability p of heads. Find the MLE p of p. Also using the invariance 
property, obtain an MLE for g = 1 — p. How would you use the results you have obtained? 


Suppose X,,..., X, area random sample from an exponential distribution with parameter 
6. Find the MLE of 6. Also using the invariance property, obtain an MLE for the variance. 


Let X be a random variable representing the time between successive arrivals at a checkout 
counter in a supermarket. The values of X in minutes (rounded to the nearest minute) are 


1 2 3 7 ll 4 #13 
12 7 3 2 ll 7 2 


Assume that the pdf of X is f(x) = (1/@)e~“/®), Use these data to find MLE 6. How can you 
use this estimate you have just derived? 


Let X1,..., Xn be arandom sample from the truncated exponential distribution with pdf 
oe —8) x>0 
f(x) = 
0, otherwise. 


Show that the MLE of 6 is min(X;). 
The pdf of a random variable X is given by 


athe 
2x —x" /ot 


f@aHle 


5 x>0 
0, otherwise. 
Using a random sample of size n, obtain MLE @ for a. 


The pdf of a random variable X is given by 


P(X=n)= J, exp (an PH OUD se 


Using a random sample of size n, obtain MLE @ for a. 


Let X;,..., X, be arandom sample from a two-parameter Weibull distribution with pdf 
x1 go (x/)" | x>0 
f(x) = 
0, otherwise. 


Find the MLEs of w and £. 
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5.3.8. Let X1,..., X, bea random sample from a Rayleigh distribution with pdf 


XX? [2a x>0 
f= 


0, otherwise. 


Find the MLEs of a. 


5.3.9. Let X1,...,X, be a random sample from a two-parameter exponential population with 
density 


1 ( 
f@eu= pe °F, for x>v, @>0. 


Find MLEs for 6 and v when both are unknown. 


5.3.10. Let X;,..., X, be arandom sample from the shifted exponential distribution with pdf 
her) | x>06 
ff) = 
0, otherwise. 


Obtain the maximum likelihood estimators of 6 and i. 
5.3.11. Let X;,..., X, be arandom sample on [0, 1] with pdf 


120) 
ro)? 


f@M= [xa—x)]?-!, @>0. 


What equation does the maximum likelihood estimate of 0 satisfy? 


5.3.12. Let X;,..., X, bea random sample with pdf 


(a+ 1)x®, O<x<1 
fQ) = 


0, otherwise. 
Find the MLE of a. 
5.3.13. Let Xi1,..., Xn, bea random sample from a uniform distribution with pdf 
sop, OSx< 3042 
f@= 
0, otherwise. 
Obtain the MLE of 6. 


5.3.14. Let X1,..., X, bea random sample from a Cauchy distribution with pdf 


ff) = [ ~O< xX < OO. 


Find the MLE for £. 
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5.3.15. The following data represent the amount of leakage of a fluorescent dye from the 
bloodstream into the eye in patients with abnormal retinas: 


16 14 12 22 1.48 1.7 
18 63 24 2.3 189 22.8 


Assuming that these data come from a normal distribution, find the maximum likelihood 
estimate of (i, 0). 


5.3.16. Let X),...,X, be a random sample from a population with gamma distribution and 
parameters w and £. Show that the MLE of = af is the sample mean fi = X. 


5.3.17. The lifetimes X of a certain brand of component used in a machine can be modeled as 
a random variable with pdf f(x) = (1/6) e~°/. The reliability R(x) of the component is 
defined as R(x) = 1 — F(x). Suppose X1, X2,..., Xn are the lifetimes of n components 
randomly selected and tested. Find the MLE of R(x). 


5.3.18. Using the method explained in Project 4A, generate 20 observations of a random variable 
having an exponential distribution with mean and standard deviation both equal to 2. What 
is the maximum likelihood estimate of the population mean? How much is the observed 
error? 


5.3.19. Let X1,..., X, bea random sample from a Pareto distribution (named after the economist 
Vilfredo Pareto) with shape parameter a. The density function is given by 


a 


f@) = 4th 
0, otherwise. 


x>1 


(The Pareto distribution is a skewed, heavy-tailed distribution. Sometimes it is used to model 
the distribution of incomes.) Show that the maximum likelihood estimator of a is 


n 


3 In (Xj) 
1 


a= 


5.3.20. Let X),..., X, bearandom sample from N (6, @),0 < @ < oo. Find the maximum likelihood 
estimate of 6. 


5.4 SOME DESIRABLE PROPERTIES OF POINT ESTIMATORS 


Two different methods of finding estimators for population parameters have been introduced in the 
preceding sections. We have seen that it is possible to have several estimators for the same parameter. 
For a practitioner of statistics, an important question is going to be which of many available sam- 
ple statistics, such as mean, median, smallest observation, or largest observation, should be chosen 
to represent all of the sample? Should we use the method of moments estimator, the maximum 
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likelihood estimator, or an estimator obtained through some other method of least squares (we will 
see this method in Chapter 8)? Now we introduce some common ways to distinguish between them 
by looking at some desirable properties of these estimators. 


5.4.1 Unbiased Estimators 


It is desirable to have the property that the expected value of an estimator of a parameter is equal to 
the true value of the parameter. Such estimators are called unbiased estimators. 


Definition 5.4.1 A point estimator 6 is called an unbiased estimator of the parameter 0 if E(6) = 6 for 
all possible values of 0. Otherwise 0 is said to be biased. Furthermore, the bias of 0 is given by 


B= E(6)—8. 


Note that the bias is nothing but the expected value of the (random) error, E(@ — 6). Thus, the 
estimator is unbiased if the bias is 0 for all values of 6. The bias occurs when a sample does not 
accurately represent the population from which the sample is taken. It is important to observe that 
in order to check whether 6 is unbiased, it is not necessary to know the value of the true parameter. 
Instead, one can use the sampling distribution of 8. We demonstrate the basic procedure through the 
following example. 


a \)1)\e s:::.:....: 
Example 5.4.1 
Let X;,..., X, be arandom sample from a Bernoulli population with parameter p. Show that the method 
of moments estimator is also an unbiased estimator. 


Solution 
We can verify that the moment estimator of p is 


Pp => = 
Because for binomial random variables, E (Y) = np, it follows that 


. Y 1 1 
())=E(~) =.) = 2 -mp=p. 
n n n 


Hence, p = Y/n is an unbiased estimator for p. 


In fact, we have the following result, which states that the sample mean is always an unbiased estimator 
of the population mean. 


Theorem 5.4.1 The mean of a random sample X is an unbiased estimator of the population mean .. 
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Proof. Let X1,..., Xn berandom variables with mean jz. Then, the sample mean is X = (1/n) )“7_, Xi. 


eee 1 
EX = —)° EX; =—-np=n. 
ee p= m= HE 


Hence, X is an unbiased estimator of jz. 


How is this interpreted in practice? Suppose that a data set is collected with n numerical observations 
X1,...,Xn. The resulting sample mean may be either less than or greater than the true population 
mean, 4 (remember, we do not know this value). If the sampling experiment was repeated many 
times, then the average of the estimates calculated over these repetitions of the sampling experiment 
will equal the true population mean. 


If we have to choose among several different estimators of a parameter 0, it is desirable to select one 
that is unbiased. The following result states that the sample variance S* = (1/n — 1) 7, (Xi -— x 
is an unbiased estimator of the population variance o?. This is one of the reasons why in the definition 
of the sample variance, instead of dividing by n, we divide by (n — 1). 


Theorem 5.4.2 If S* is the variance of a random sample from an infinite population with finite variance 
o*, then S* is an unbiased estimator for 0°. 


Proof. Let X1,..., Xn be iid random variables with variance o? < 00. We have 
E(s?) = Ley ease Siw) —(®—n)P 
n—-1 n—-1 |+4 ; 
1 2 2 -_ 2 
>) F(X - 4) —nE{X—p}']. 


n-1 
i=1 


ies] 
m™ 
® 
a 
No 
| 


Because E{(X; — 4)*} = 0? and E{(X — u)°} = o7/n, it follows that 


a nh ee g 
E(S*) = ——— ea ae =o. 
i=1 


Hence, S? is an unbiased estimator of o7. 


It is important to observe the following: 


1. S? is not an unbiased estimator of the variance of a finite population. 

2. Unbiasedness may not be retained under functional transformations, that is; if @ is an unbiased 
estimator of 6, it does not follow that f(@) is an unbiased estimator of f(0). 

3. Maximum likelihood estimators or moment estimators are not, in general, unbiased. 

4. In many cases it is possible to alter a biased estimator by multiplying by an appropriate constant 
to obtain an unbiased estimator. 


The following example will show that unbiased estimators need not be unique. 
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ooo, 


Example 5.4.2 
Let X,,..., X, be a random sample from a population with finite mean jz. Show that the sample mean X 
and 3X + 3X1 are both unbiased estimators of ju. 


Solution 
By Theorem 1, X is unbiased. Now 


Hence, 4X + 3X1 is also an unbiased estimator of ju. 
= 


How many unbiased estimators can we find? In fact, the following example shows that if we have 
two unbiased estimators, there are infinitely many unbiased estimators. 


ooo, 


Example 5.4.3 
Let 6, and > be two unbiased estimators of 9. Show that 


63 = ab); +(1—a)6,0<a<1 


is an unbiased estimator of 6. Note that 63 is a convex combination of 61 and A>. In addition, assume that 
61 and @> are independent, and Var(61) = ot and Var(67) = Oe How should the constant a be chosen in 
order to minimize the variance of 63? 


Solution 
We are given that E(61) =060and E(62) = 0. Therefore, 


(83) = E [@ £9 é2| — aE, + (1 — a) E> 


ad+(1-—ajo=60. 


Hence 63 is unbiased. By independence, 


Var (3) = Var [@A +(1-a) 8.| 


a Var (61) + - a)* Var (62) 


aot +(- a)? a. 
To find the minimum, 


d a 
7 a" (63) = 2aot —-20- aos = 0, 
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gives us 


Because a v3) = 20? + 205 > 0, V(63) has a minimum at this value of ‘a’. Thus, if of — a; then 
a=1/2. 
jizz] 


es 
Example 5.4.4 


Let X1,..., X, be arandom sample from a population with pdf 
de IB, x>0 
f(x) = 
0, otherwise. 


Show that the method of moments estimator for the population parameter £ is unbiased. 


Solution 
From Section 5.2, we have seen that the method of moments estimator for B is the sample mean X, and 
the population mean is B. Because E(X) = yt = B, the method of moments estimator for the population 
parameter B is unbiased. 

= 


As we have seen, there can be many unbiased estimators of a parameter 0. Which one of these 
estimators can we choose? If we have to choose an unbiased estimator, it will be desirable to choose 
the one with the least variance. If an estimator is biased, then we should prefer the one with low bias 
as well as low variance. Generally, it is better to have an estimator that has low bias as well as low 
variance. This leads us to the following definition. 


Definition 5.4.2 The mean square error of the estimator 6, denoted by MSE(@), is defined as 
x x 2 
MSE (6) = E (3-8) 


Through the following calculations, we will now show that the MSE is a measure that combines both 
bias and variance. 


MSE (6) = E( 


wD 
Dp 
—’ 
N 
| 
ty 
mm 
“™ 
Dp 
es] 
ee 
p> 
—" 
—" 
“——s“ 
el 
“~~ 
D 
—" 
DS 
—" 
Ln 
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because letting B = E(6) — 6, we get 
MSE(8) = Var(@) + B?. 


B is called the bias of the estimator. Also, E(@ — E(6))(E(6) — 6) = 0. 


Because the bias is zero for unbiased estimators, it is clear that MSE(@) = Var(6). Mean square error 
measures, on average, how close an estimator comes to the true value of the parameter. Hence, this 
could be used as a criterion for determining when one estimator is “better” than another. However, 
in general, it is difficult to find 6 to minimize MSE (6). For this reason, most of the time, we look only 
at unbiased estimators in order to minimize Var(6). This leads to the following definition. 


Definition 5.4.3. The unbiased estimator 6 that minimizes the mean square error is called the minimum 
variance unbiased estimator (MVUE) of 0. 


e—X——_!?:0 LS:::?°0 0. oo a“ 
Example 5.4.5 
Let X1, X2, X3 beasample of size n = 3 froma distribution with unknown mean jz, —co < pu < 00, where 
the variance o2 is a known positive number. Show that both 6, = Xandé) = [(2X1, + X2 + 5X3) /8] are 
unbiased estimators for 2. Compare the variances of 6; and 69. 


Solution 
We have 


and 


1 
3 [2EX; + EX2+ 5EX3| 


by 
Feel 
Dd 
N 
— 
ll 


1 
g Pet et Su = pL. 
Hence, both 61 and 5 are unbiased estimators. 
However, 
2 
. o 
Var (61) = 3" 


whereas 


a 2X,+xX 5X 
Var (62) = Var (Ae) 


a Shs Bee BOG 
a Vea" Sai See 


Because Var (61) < Var (62), we see that X is a better unbiased estimator in the sense that the variance of X 


is smaller. 
= 
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It is important to observe that the maximum likelihood estimators are not always unbiased, but it 
can be shown that for such estimators the bias goes to zero as the sample size increases. 


5.4.2 Sufficiency 


In the statistical inference problems on a parameter, one of the major questions is: Can a specific 
statistic replace the entire data without losing pertinent information? Suppose X),..., X, isarandom 
sample from a probability distribution with unknown parameter @. In general, statisticians look for 
ways of reducing a set of data so that these data can be more easily understood without losing the 
meaning associated with the entire collection of observations. Intuitively, a statistic U is a sufficient 
statistic for a parameter 6 if U contains all the information available in the data about the value of 0. 
For example, the sample mean may contain all the relevant information about the parameter jz, and 
in that case U = X is called a sufficient statistic for 7. An estimator that is a function of a sufficient 
statistic can be deemed to be a “good” estimator, because it depends on fewer data values. When 
we have a sufficient statistic U for 0, we need to concentrate only on U because it exhausts all the 
information that the sample has about 0. That is, knowledge of the actual n observations does not 
contribute anything more to the inference about 6. 


Definition 5.4.4 Let X,,..., X, be a random sample from a probability distribution with unknown param- 
eter 0. Then, the statistic U = g(X1,..., Xn) is said to be sufficient for 6 if the conditional pdf or pf of 
X1,..., Xn given U = u does not depend on @ for any value of uv. An estimator of 6 that is a function 


of a sufficient statistic for 0 is said to be a sufficient estimator of 0. 


.— —_—_——_—>_ ———_—  —__ 
Example 5.4.6 


Let X1,..., Xn be iid Bernoulli random variables with parameter 6. Show that U = )~"_, X; is sufficient 
for 0. 

Solution 

The joint probability mass function of X1,..., Xn is 


Eee we: 
f(M%,..., Xn 9) =9=1 1-8) = , O<OK<1. 


Because U = )~_, Xj we have 


f(X1,..., Xn) = 0% 1-0)" 9, OK<U <n. 


Also, because U ~ B(n, 0), we have 
"\ ,U n—U 
pone) (")¢ (1-86) : 
Uu 


Also, 


ie. tea ad eS Eel u= DX; 
f( 1; > n | ) : 
fu W) 0, otherwise. 
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Therefore, 
64“ (1—6)" = 1 ifu= x; 
” Vou —eyn—u x oe 
f(X1,...,%n|U =u) = , . 
0, otherwise. 


which is independent of @. Therefore U is sufficient for 0. 


k&:2. 27g OLLI 
Example 5.4.7 
Let X1,..., X, be arandom sample from U(0, 6). That is, 


ifO<x<8@ 
otherwise. 


i, 
f@= (; 


Show that U = max X is sufficient for 0. 
1l<i<n 


Solution 
The joint density or the likelihood function is given by 


1 7 
gr, if O<x],...,X% <0 
F@1,.--5 3070) = 1 i 
0, otherwise. 
The joint pdf f(x1,...,%n;9) can be equivalently written as 


if xmin > 0, max <0 


1 
f(X1, 6. Xni 9) = fF 


0, otherwise. 


Now, we can compute the pdf of U. 


Flu) = PU <u) = P(Xj,..., Xn <u) 
n 
= Il P(X; <u) (because of independence) 
i=1 


The pdf of U may now be obtained as 


nu?! 


gn 


f@) = < Fw) = 


254 CHAPTERS Point Estimation 


Moreover, 
Poet wk) = Bitetat if u = Xmax and Xmin > 0 
X1,..-,Xy, |U) = 4 
fea nla) 0, otherwise. 
Using the expressions for f(x1,...,%n) and fy(u) we obtain 
ro eee if u = Xmax and xmin > O 
ft, ...,Xn|uU =u) = 
0, otherwise 
f(Xq,..., Xn |U) is a function of u and xp in Which is independent of 6. Hence, U= ymax X; is sufficient 
<i<n 
for 0. 
|| 
The outcome X1,..., X, is always sufficient, but we will exclude this trivial statistic from consid- 


eration. In the previous two examples, we were given a statistic and asked to check whether it was 
sufficient. It can often be tedious to check whether a statistic is sufficient for a given parameter based 
directly on the foregoing definition. If the form of the statistic is not given, how do we guess what is 
the sufficient statistic? Now think of working out the conditional probability by hand for each of our 
guesses! In general, this will be a tedious way to go about finding sufficient statistics. Fortunately, the 
Neyman-Fisher factorization theorem makes it easier to spot a sufficient statistic. The following result 
will give us a convenient way of verifying sufficiency of a statistic through the likelihood function. 


NEYMAN-FISHER FACTORIZATION CRITERIA 

Theorem 5.4.3 Let U be a statistic based on the random sample X1,..., Xn. Then, U is a sufficient statistic 
for 0 if and only if the joint pdf (or pf) f(x1,.-., ni) (which depends on the parameter @) can be factored 
into two nonnegative functions. 


iC olmetrer r1) 081 (E10) 1h eve CE rT) eam) OGLE Tee ern 


where g (u, @) is a function only of u and 0 and h (x1, ..., Xn) is a function of only x1, ..., xn and not of 0. 


Proof. (Discrete case.) We will only give the proof in the discrete case, even though the result is also 
true for the continuous case. First suppose that U (X1,..., X,) is sufficient for 9. Then, X; = x1, X2. = 
XQ, 0505 Xn = Xn if and only if X; = x1, X2 = X2, pee, Cr = Xn and U (Xj, ..)Xn) = U (x1,.--5Xn) = 
u(say). Therefore 


FOtisces tO) = Po(X1 = x1, X20 = 20y055, Xn =n and U =H) 


= Po (Xy = 4x1, X2 = 42,...,Xn = Xn |U =u) Pp U =u). 
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Because U is assumed to be sufficient for 6, the conditional probability Pp» (Xi = x1, X2 = 
X2,...,X, =X, |U =u) does not depend on @. Let us denote this conditional probability by 
h(x1,..-,Xn). Clearly Po(U = u) is a function of u and 6. Let us denote this by g(u, 6). 


It now follows from the equation above that 
f(X1, +--+, 4n; 0) = gu, AAC, ..., Xn) 


as was to be shown. 


To prove the converse, assume that 
F1,---,4ni 9) = gu, Ah (x,...,Xn)- 
Define the set A, by 


Ay = {(x1,.--,4n):U (x1,...,Xn) =u}. 


That is, A, is the set of all (x1, ...,x,) such that U maps it into uv. We note that A, does not depend 
on 6. Now 


Po(X1 = x1, X2 = x2,..., Xn = Xn |U =u) 
Po (X1 = x1,X2 =x2,...,Xn =Xn and U = u) 


Po U =u) 
POX 01, X90  ey A pn = : 
= oti =a1 Mo Pitan) cond ae if (X1,...,Xn) € Au 
0, if (x1,...,%n) € Au. 


If (x1, .-.,%) ¢ Ay, then, clearly, 


Sf (1, +665 %n7 8) = Po (Xy = x1, X2 =X2,...,Xn =Xn|U =u) 


which is independent of 0. 


If (x41, ...,%n) € Au, then, using the factorization criterion, we obtain 


Po(X = x1, X2 = x2,...,Xn =Xn|U =u) 


— Po (X1 = x1, X2 = X2,..., Xn = Xn) 
7 Po (U =u) 
_ FO Ani) _ g(u,O)h (x1,...,Xn) 
Po (U =u) » gu, O)h(x1,...,Xn) 
Cormeen Xn)EAu 
_ glu, AA (x1,..., xn) _ A (x1, ...,%n) 
g (u, 8) Dy h(x1,..-5Xn) »~ h(x1,...,Xn) 
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Therefore, the conditional distribution of X1,..., Xn given U does not depend on 8, proving that U 
is sufficient. 


One can use the following procedure to verify that a given statistic is sufficient. This procedure is 
based on factorization criteria rather than using the definition of sufficiency directly. 


PROCEDURE TO VERIFY SUFFICIENCY 
1. Obtain the joint pdf or pf fo(x1, ...,Xn). 
2. If necessary, rewrite the joint pdf or pf in terms of the given statistic and parameter so that one can 
use the factorization theorem. 
3. Define the functions g and h, in such a way that g is a function of the statistic and parameter only 
and h is a function of the observations only. 
4. If step 3 is possible, then the statistic is sufficient. Otherwise, it is not sufficient. 


In general, it is not easy to use the factorization criterion to show that a statistic U is not sufficient. 
We now give some examples using the factorization theorem. 


SSS 
Example 5.4.8 
Let X;,..., X, denote a random sample from a geometric population with parameter p. Show that X is 
sufficient for p. 


Solution 
For the geometric distribution, the pf is given by 


pad=pyr, «21 
SQ, P) = 


0, otherwise. 
Hence, the joint pf is 


=n a 
f(x1,.--.%n; p) = p"(1-—p) 


pid — py", ifxy,...,%n > 1 


0, otherwise. 


Take, 


. <yieie th j 7 1, if x,>1 
g(x%, p)=p -p) an AG, i225 %_) = 
0, otherwise. 


Thus, X is sufficient for p. 
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$$$ 


Example 5.4.9 
Let X1,..., X, denote a random sample from a U (0, @) with pdf 


1 
-, O<x<06, O>0 
folx) = } ? 
0, otherwise. 
Show that X (n)= max X; is sufficient for 6, using the factorization theorem. 
isn 


Solution 
The likelihood function of the sample is 


a’ ifO0 <x],...,%1 <9, 


fo (%1,---,%n) = 
0, otherwise. 


We can now write fo (x1,..--,Xn) as 


Jo Rieck SN Gigs Rn) B10 9 )) POP all Mis cuts Re 


where 
1, ifxj,...,%. >0 
h(x4,-.-,Xn) = 
0, otherwise 
and 
A if O 6 
ne ee <X(n) <9, 
sGixm)=) ™ 


0, otherwise. 


From the factorization theorem, we now conclude that X(q) is sufficient for @. In the next definition, we 
introduce the concept of joint sufficiency. 
= 


Definition 5.4.5 Two statistics U; and U> are said to be jointly sufficient for the parameters 0, and 62 
if the conditional distribution of X,,..., Xn given U, and U2 does not depend on 6, or 0. In general, 
the statistic U = (U,,..., Un) is jointly sufficient for 0 = (0),...,0,) if the conditional distribution of 
X1,..., Xn given U is free of 0. 


Now we state the factorization criteria for joint sufficiency analogous to the single population 
parameter case. 
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THE FACTORIZATION CRITERIA FOR JOINT SUFFICIENCY 
Theorem 5.4.4 The two statistics U, and U2 are jointly sufficient for 6; and 62 if and only if the likelihood 
function can be factored into two non-negative functions, 


f(%1, +++ Xn7 1, 02) = B(uy, U2; 01, 62) h(x1,..., Xn) 


where g (uy, U2; 91, 02) is only a function of u,, U2; 01 and 02, and h(x1, Xn) is free of 01 or 62. 


—— eee ——————————_—_————_—_—_—— 


Example 5.4.10 
Let X1,..., X, be arandom sample from N(, 02). 


(a) If wis unknown and o? = 06 is known, show that X is a sufficient statistic for ju. 
(b) If = “0 is known and o? is unknown, show that }~/_, (X; — “o)? is sufficient for 0. 


(c) If wand o? are both unknown, show that )-7_, X; and )-7_, X? are jointly sufficient for y and o?. 


Solution 
The likelihood function of the sample is 


1 i 
~ (Q7r)n/2 gn ie 


1 le fea e “ a 
7 (Qr)n/2 gn 0 3 ( ‘ TT Pea 


(a) When o? = OG is known, use the factorization criteria, with 


= 2npx — ny 
g(%, 4) = exp | ——, 
205 
and 
n 
oe 
h(xq,..+5Xn) = (20)7"/? 0” exp _ i=l 

202 


Therefore, X is sufficient for ju. 
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(b) When pz = Uo is known, let 


(Da —)*,0 Ve oe) == 


and 
h(x1, eee 2) = ee 
(2)"/2 
Thus, yoy (Xi — 1)? is sufficient for 02. 
(c) When both and o2 are unknown, use 


n n 
> x? —2y yo xj + np? 
=1 i=1 


n n 
: 2 2) __-n i = 


and 


1 


h(x1,.--,Xn) = mr?’ 


Hence, Y~#_, X; and )~"_, X? are jointly sufficient for and 0? 


ee ——_—$< 


Example 5.4.11 
Suppose that we have a random sample X1,..., Xn from a discrete distribution given by 


fox) =C (2-7/9, x =0,041,042,...; O>0 


where C (@) > 0 is a normalizing constant. Using the factorization theorem, find a sufficient statistic for 6. 


Solution 
The joint density function f(x1,...,Xni 0) of the sample X1,..., Xn is 
— Y@i/9) ; 
fxr xni0) = C(@)2 #1 , X1,%X2,...,Xn are integers > 0 
0, otherwise. 
The function f(x1,...,%n;9) can be written as 


~¥(xi/6) 
S(X1,-6-5 ni 8) = h(x, ...,%n) C(O) 2 = g1(9, x(1)) 
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where x(q) = min (x1,..., Xn), and 
I 
1, if x; — x(q) => Ois an integer for j= 1,2,...,n 
h(x, x2,...,Xn) = 
0, otherwise 
and 
1, if xq) 20 
g1(9 x1) = 
0, otherwise. 
Thus, 


F%1,---,%n} 9) = h(x, +5 Xn) 8(8, xa) 
[= 


—¥ @i/¢) 
where g(6, >> xi, x(1)) = C(@)2_ =I g1(6, x(1)). Using the factorization theorem, we conclude that 
(do xi, x(1)) is jointly sufficient for 6. This result shows that even for a single parameter, we may need 
more than one statistic for sufficiency. 


When using the factorization criterion, one has to be careful in cases where the range space depends 
on the parameter. 


Using the factorization criterion, we can prove the following result, which says that if we have 
a unique maximum likelihood estimator, then that estimator will be a function of the sufficient 
statistic. 


Theorem 5.4.5 If U is a sufficient statistic for 0, the maximum likelihood estimator of 0, if unique, is a 
function of U. 


Proof. Because U is sufficient, by Theorem 5.4.1, the joint pdf can be factored as 


F(1,---54ni 9) = gu, A h(x], ..., Xn). 


This depends on @ only through the statistic U. To maximize L we need to maximize g(U, 8). 


Many common distributions such as Poisson, normal, gamma, and Bernoulli are members of the 
exponential family of probability distributions. The exponential family of distributions has density 
functions of the form 
exp [k(x)c(0) + S(x) +d(®)],  ifxe B 
f9) = 
0, x€B 


where B does not depend on the parameter 0. 
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————OOOOOOOO:?”:.0 nn nn  a——— ss O  — eee 
Example 5.4.12 
Write the following in exponential form. 


e Aye 
(a) x! 
(b) p*(—p)'* 
1 2 
—(x-p)*/2 
c e 
eh Te 
Solution 
(a) We have 
—Ayx 
: ss exp [xIna —Inx!— A]. 
x! 


Here k(x) = x, c(A) = Ind, S(x) = —In(a!), and d(A) = —2. 
(b) Similarly, 


p* (1— p)'* =exp in (4) +In(1 — | , x=Oorl. 
(c) This is the standard normal density. 


1 eT Mw)? /2 


J2n 


2 
— exp[xu — 4 — 4 — Finan), —0O <x <0. 
[ise 


Note that in the previous example, for each of the cases, }“_, X; is a sufficient statistic for the 
parameter. In the next result, we give a generalization of this fact. 


Theorem 5.4.6 Let X),..., X, be a random sample from a population with pdf or pmf of the exponential 
form 


peje +S@)+4@)), ees 
—— 0, x¢B 


where B does not depend on the parameter 6. The statistic )~_, k (X;) is sufficient for 0. 


Proof. The joint density 


FU sxc gente) Cap few yreaa+ Pr sey+ndo) 


i=1 i=1 


= {> |. Ok) +nd || {ex [Sseo]} . 


i=1 


Using the factorization theorem, the statistic }7""_, k (X;) is sufficient. 
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It does not follow that every function of a sufficient statistic is sufficient. However, any one-to-one 
function of a sufficient statistic is also sufficient. Every statistic need not be sufficient. When they 
do exist, sufficient estimators are very important, because if one can find a sufficient estimator it 
is ordinarily possible to find an unbiased estimator based on the sufficient statistic. Actually, the 
following theorem shows that if one is searching for an unbiased estimator with minimal variance, 
it has to be restricted to functions of a sufficient statistics. 


RAO-BLACKWELL THEOREM 
Theorem 5.4.7 Let X1,..., Xn be a random sample with joint pf or pdf f(x1,..., Xn; 0) and let 
U = (U,,..., Un) be jointly sufficient for 0 = (01, ..., 01). If T is any unbiased estimator of k (0), and if 
T* = E(T|U), then: 
(a) T* is an unbiased estimator of k(@). 
(b) 7* is a function of U, and does not depend on 0. 
(c) Var (T*) < Var(T ) for every 6, and Var (T*) < Var(T ) for some 6 unless T* = T with probability 1. 


Proof. 
(a) By the property of conditional expectation and by the fact that T is an unbiased estimator of k(6), 
E(T*)= E(E(T|U)) = E(1) = k(6). 


Hence, T* is an unbiased estimator of k(6). 

(b) Because U is sufficient for 6, the conditional distribution of any statistic (hence, for T), given U, 
does not depend on 6. Thus, T* = E(T|U) is a function of U. 

(c) From the property of conditional probability, we have the following: 


Var (T) = E (Var (T |U)) + Var (E(T |U)) 


= E (Var (T |U)) + Var (T*) . 


Because Var (T|U) > 0 for all u, it follows that E (Var (T|U)) > 0. Hence, Var (T*) < Var(T). We 
note that Var (T*) = Var(T) if and only if Var (T|U) = 0 or T is a function of U, in which case 
T* = T (from the definition of T* = E(T|U) =T). 


In particular, if k (0) = 6, and T is an unbiased estimator of 6, then T* = E (T |U) will typically give 
the MVUE of 6. If T is the sufficient statistic that best summarizes the data from a given distribution 
with parameter 6, and we can find some function g of T such that EF (g (T)) = 6, it follows from the 
Rao-Blackwell theorem that g(T ) is the UMVUE for 6. 


EXERCISES 5.4 
5.4.1. Let X),..., X, be arandom sample from a population with density 


e 9) forx> 6 
0, otherwise. 


f(x) = 
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(a) Show that X is a biased estimator of 6. 
(b) Show that X is an unbiased estimator of w = 1+. 


5.4.2. The mean and variance of a finite population {a;,..., ay} are defined by 


1 N 


N 
1 2 2 
w=—) aj and o =—) (aj — pb)’. 
NS N 


i=1 
For a finite population, show that the sample variance S? is a biased estimator of o?. 


5.4.3. Foran infinite population with finite variance o?, show that the sample standard deviation 
S is a biased estimator for o. Find an unbiased estimator of o. [We have seen that S* is an 
unbiased estimator of o?. From this exercise, we see that a function of an unbiased estimator 
need not be an unbiased estimator. | 


5.4.4. Let Xi,...,X, be a random sample from an infinite population with finite variance o7. 
Define 
1 n 2 


i=1 


: F . ; : 2 . 
Show that S$” is a biased estimator for o*, and that the bias of S’? is —-~. Thus, S$” is 
negatively biased, and so on average underestimates the variance. Note that 5S’? is the MLE 


of o?. 
5.4.5. Let X1,...,X, be arandom sample from a population with the mean jz. What condition 
must be imposed on the constants c1, C2, ..., Cn so that 


cyX1 +02X2 ++++ + enXn 


is an unbiased estimator of jx? 


5.4.6. Let X1,..., X, be arandom sample from a geometric distribution with parameter @. Find 
an unbiased estimate of 0. 


5.4.7. Let Xi,..., X, be arandom sample from U (0, @) distribution. Let Y, = max{X,..., Xn}. 
We know (from Example 5.3.4) that 6; = Y, isa maximum likelihood estimator of 6. 


(a) Show that 6) = 2X is a method of moments estimator. 
(b) Show that 6, is a biased estimator, and 6 is an unbiased estimator of 6. 
(c) Show that 63 = ntl6, is an unbiased estimator of 0. 


5.4.8. Let X),..., X, be arandom sample from a population with mean p and variance 1. Show 
that ji2 = X° is a biased estimator of 2, and compute the bias. 


5.4.9. Let X;,..., X, bearandom sample from an N (j1, o*) distribution. Show that the estimator 
ju = X is the MVUE for pu. 
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5.4.10. Let X1,..., Xn, bea random sample from an N (11, 0”) distribution and let Y1,..., Yn) be 
a random sample from a N (12, o*) distribution. Show that the pooled estimator 


42 _ (= DST + — DSF 
nytng—2 


is unbiased for o*, where S? and S+ are the respective sample variances. 


5.4.11. Let X;,...,X, bea random sample from an N (1, o*) distribution. Show that the sample 
median, M, is an unbiased estimator of the population mean 4. Compare the variances of 
X and M. [Note: For the normal distribution, the mean, median, and mode all occur at 
the same location. Even though both X and M are unbiased, the reason we usually use the 
mean instead of the median as the estimator of z is that X has a smaller variance than M.] 


5.4.12. Let X1,..., X, be a random sample from a Poisson distribution with parameter 4. Show 
that the sample mean X is sufficient for A. 


5.4.13. Let X),..., X, be arandom sample from a population with density function 


1 
fo(x) = exp ( at) «o<X<ow, o>0. 
20 o 


Find a sufficient statistic for the parameter o. 


5.4.14. Show that if @ is a sufficient statistic for the parameter @ and if the maximum likelihood 
estimator of @ is unique, then the maximum likelihood estimator is a function of this 
sufficient statistic 0. 


5.4.15. Let X;,..., X, be arandom sample from an exponential population with parameter 6. 


(a) Show that }“"_, X; is sufficient for 6. Also show that X is sufficient for 6. 
(b) The following is a random sample from exponential distribution. 


15 30 26 68 0.7 2.22 13 16 1.1 65 
0.32.00 18 1.0 0.7 0.7 16 3.0 2.0 2.5 
5.7 0.1 0.2 05 0.4 


(i) What is an unbiased estimate of the mean? 
(ii) Using part (a) and these data, find two sufficient statistics for the parameter 6. 


5.4.16. Let X;,..., X, be arandom sample from a one-parameter Weibull distribution with pdf 
Qaxe~@” , x>0 
f@) = 
0, otherwise. 


(a) Find a sufficient statistic for a. 
(b) Using part (a), find an UMVUE for a. 
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5.4.17. Let X1,..., Xn bea random sample from a population with density function 
1 0 é 
9’ 73 <x >? 6>0 
ff) = : 
0, otherwise. 


Show that (min X;, max xi) is sufficient for 6. 


1l<i<n l<i<n 


5.4.18. Let X;,..., X, bearandom sample from a G(1, f) distribution. 
(a) Show that U = >", X; is a sufficient statistic for B. 


(b) The following is a random sample from a G(1, f) distribution. 


0.3 34 04 18 0.7 10 01 2.3 3.7 2.0 
0.3 3.7 0.2 13 12 33 02 13 06 04 


Find a sufficient statistic for f. 


5.4.19. Show that Xj is not sufficient for w, if X;,..., X, isa sample from N(w, 1). 


5.4.20. Let X),...,X, be a random sample from the truncated exponential distribution 
with pdf 
ef-*, x>06 
f(x) = 
0, otherwise. 


Show that X(1) = min(X;) is sufficient for 6. 


5.4.21. Let X;,..., X, bearandom sample from a distribution with pdf 


6x91, O<x<1l1, @>0 


ff) = 
0, otherwise. 
Show that U = X,..., X, is a sufficient statistic for 0. 
5.4.22. Let X1,...,X, be arandom sample of size n from a Bernoulli population with parameter 


p. Show that p = X is the UMVUE for p. 
5.4.23. Let X1,..., X, be arandom sample from a Rayleigh distribution with pdf 


a9 
eg ere, x>0 


f@)= 


0, otherwise. 


Show that )~_, X? is sufficient for the parameter a. 
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5.5 OTHER DESIRABLE PROPERTIES OF A POINT ESTIMATOR 


In this section, we discuss a few more properties of point estimators that can be used in choosing a 
particular estimator. 


5.5.1 Consistency 


It is a desirable property that the values of an estimator be closer to the value of the true parameter 
being estimated as the sample size becomes larger. To this end, we now introduce the notion of 
consistent estimators. Consistency is a large-sample, or asymptotic, property. That is, it describes 
the behavior of estimators as the sample size n becomes infinitely large. In this section, we use the 
notation 6, for 6 to show the dependence of the estimator on the sample size n. 


Definition 5.5.1 The estimator 6, is said to be a consistent estimator of 6 if, for any ¢ > 0, 
lim, P[|@n -6| se] =1 
n—- Oo 
or equivalently, 


lim PL |en —6|> e| =0. 
noo 

The statement “6, is a consistent estimator of 9” is equivalent to “6, converges in probability to 6.” 
That is, the sample estimator should have a high probability of being close to the population value 
6 for large sample size n. The idea of consistency can be observed in Figure 5.2, where 6, converges 
to 6. If it did not, 6, would not be a consistent estimator of 6. 


If the estimator is unbiased, we have the following result, which gives a sufficient condition for the 
consistency of an estimator. However, it is important to note that a consistent estimator need not be 
unbiased, and hence this result is not a necessary condition. 


A SUFFICIENT CONDITION FOR CONSISTENCY OF AN UNBIASED ESTIMATOR 
Theorem 5.5.1 An unbiased estimator 6, of 6 is a consistent estimator for 6 if 


lim Var (On) =0. 
n—-> oo 


Wi FIGURE 5.2 Consistency of an estimator. 
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The proof of this theorem follows directly from Chebyshev’s inequality. A general version of this result 
is proved in Theorem 5.5.3. 


ss, eee 
Example 5.5.1 


Let X,,..., X, be a random sample with true mean y and finite variance. Then, the sample mean X is a 
consistent estimator of the population mean ju. 


Solution 


We show this result in two ways. 


(i) Using Chebyshev’'s inequality, P{|X — u| =e} < we 


, we obtain 
o2. 
Pl|X-u| sk] 21-5 


=1-—— > lasn>o. 
k2n 
Hence, X is a consistent estimator of 1. 
(ii) First note that X is an unbiased estimator of ju. Because Var(X) = (0*/n), we have 
et 


lim —=0. 
n>oo n 


Thus, from the previous theorem, X is a consistent estimator of ju. 


= 
We can generalize Theorem 5.5.1 even when the estimator is biased. The following result states that 


the mean square error of 6, decreases to zero as more and more observations are incorporated into 
its computation. 


TEST FOR CONSISTENCY 
Theorem 5.5.2 Let 6, be an estimator of 0 and let Var(6,) be finite. If 


lim, E[@n —9)| =0 


then On is a consistent estimator of 0. 


Proof. Using Chebyshev’s inequality, we obtain 


; E| (én — 9)" | 
P[ [dn —6| > 6] < 
|6n 6|>e < 3 
Because 


jim, E [ = 6)? | = 0, [by hypothesis] 
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the right-hand side converges to zero. Thus, 


lim, P| |@n —6| = «] = 0. 


n—-> oo 


Consequently 6, is a consistent estimator of 6. 


Furthermore, we know that 


E [@: & 6)”| = Var(8n) + Eo 


and for unbiased estimators, the bias B(n) is zero. As a result, Theorem 5.5.1 is a particular case 
of Theorem 5.5.3. We now summarize the procedure for testing for consistency of an estimator as 
follows: 


PROCEDURE TO TEST FOR CONSISTENCY 
1. Check whether the estimator 6, is unbiased or not. 
2. Calculate Var (4) and B(4n), the bias of 6n. 
3. An unbiased estimator is consistent if Var (6n) —> 0asn— oo. 
4. A biased estimator is consistent if both 


Var(4n) > 0 and B(6n) > 0 asn— oo. 


———OOOOOOOOOOOOOO:£2°0&0 Onna _ nn eee 
Example 5.5.2 
Let X1,..., Xn be arandom sample from N (u, o7) population. 
(a) Show that the sample variance S? is a consistent estimator for 02. 
(b) Show that the maximum likelihood estimators for 4 and o? are consistent estimators for 4 and 07. 


Solution 


(a) We have already seen that ES* = o?, and hence, S* is an unbiased estimator of o*. Because 
the sample is drawn from a normal distribution, we know that [(n —1) e/a" has a chi-square 
distribution with (n — 1) d.f. and 


D 
Var (“2*) =2(n—-1). 


o2 


Thus, 


54 Var(S?). 


- 49 s@2 ee 
ann =ver( 2e)-o 1) 
a 
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This implies that 
204 
Var(S*) = = —> Oasn— oo. 


Hence, S? is a consistent estimator of the variance of a normal population. 
(b) We have seen that the MLE of u is fi = X, and that of o? is 6% = (1/n) 7, (Xi — zy. Now ji is 
an unbiased estimator of 4, and Var(X) = (o2/n) > 0 as n > ov. Therefore, from Theorem 5.5.1, 
X is a consistent estimator for j. 
Now we will use the identity 
in 2 ss a a2 
E| Gn — 8)” | = Var(@n) + [BGn)| 


to show that the MLE for o2 is biased with 


and 


B(62) = noo? = ao 


Thus, 62 = (1/n) Wy (Xj - a = ((n — 1) /n) S?. Using part (a), we get 


(n — 1)? 


72 Var(S*) 


Var(67) = 


_ a 1)° 207 _ An-)) (02)? 


n2 (n — 1) ~ n2 
Therefore, 

2 2 s —o? : 5 
ee) =, ee ae) 

2 

2(n — 1) (0? 
= lim ( dt ) = 0. 
noo n 


By Theorem 5.5.3, 


is a consistent estimator of o2. 
|| 


From the foregoing example we can see that consistent estimators need not be unique. It turns out that 
most of the MLEs and method of moments estimators derived for important probability distributions 
are consistent. 
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5.5.2 Efficiency 


We have seen that there can be more than one unbiased estimator for a parameter 0. We have also 
mentioned that the one with the least variance is desirable. Here, we introduce the concept of effi- 
ciency, which is based on comparing variances of the different unbiased estimators. If there are two 
unbiased estimators, it is desirable to have the one with a smaller variance. 


Definition 5.5.2 If 6; and 6> are two unbiased estimators for 0, the efficiency of 6, relative to 62 is the 


ratio (a ) 
AP Var (02 
e(6y ’ 67) = Ane 
Var(61) 
If Var(62) > Var(61), or equivalently, e(6,, 02) > 1, then, 6, is relatively more efficient than 62. That 
is 6; has a smaller variance as compared to the variance of 6). 


We summarize the following procedure to compare the efficiencies of the different unbiased 
estimators. 


PROCEDURE TO TEST RELATIVE EFFICIENCY 


1. Check for unbiasedness of 6, and 65. 
2. Calculate the variances of 6; and 63. 
3. Calculate the relative efficiency as 


4. Conclusion: Ife (1, oy) < 1,65 is more efficient than 61, and if e(61, 42) > 1, then, 6 is more efficient 
than 62. Among the unbiased estimators, the more efficient estimator is preferable. 


ee ——————<—<<_—_——————L——[————<<<< i 
Example 5.5.3 
Let X,,...,Xn,n > 3, be a random sample from a population with a true mean j and variance o?. 
Consider the following three estimators of ju: 


A 1 
= 3g 1 bao Xa), 
=-X : (X2+++++Xn-1) + x 
= 8 4(n—2) 2 n-1 n> 
and 
63 = X. 
(a) Show that each of the three estimators is unbiased. 
(b) Find e(62, 61), e(63, 61), and e(63, 62). Which of the three estimators is more efficient? 
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Solution 
(a) Given E(X;) = w,i=1,2,...,n. Then, 


7 1 - 
E(61) = 3 [E(X1) + E(X2)+ E(X3)] = > =p—e 


- 1 3 1 
E(62) = gE + Tay EX) +++++ E(Xn—1)) + gEXn) 
ee eee ee 
= gir a—D) oh gee 


Hence, 61, 69, and 63 are unbiased estimators of u. 
(b) Computing the variances, we have 


Var (61) = ; (Var (X1) + Var(X2) + Var(X3)) 


2 
= 392 = 7. 
3 
2 2 2 
r oO 9(n-—2)o oO 
Var (62) = 
i arr 16-22 64 
26° Jo? n+16 5 
_ + = Oo”. 
64 16 (n — 2) 32 (n — 2) 
2 
Var(83) = —. 


The relative efficiencies are 


cm Var (62) _ 07 (n+ 16) /32(n — 2) 
ees) = Var(61) = 02/3 


3 16 
= ai) <1lforn > 3. 
a5 =D) 


Thus, for n = 4, 05 is more efficient than 64. 


(63,61) = " 1 forn>4 
e\03, 07 ~ Var(a3)  o2/n 3 a &, 
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Hence, forn > 3, 63 is more efficient than 64. 


o(0s, 8) = Vor@2) _ Sinan” 
a Var (63) o*/n 


2 
16 
= ia el > 1 forn > 4. 
32 (n—2) 


Therefore, even though both 636 are based on all the n observations, for n > 3, the sample mean 63 is 


more efficient than 69. 
= 


It is reasonable to compare estimators on the basis of variance alone if they are both unbiased. 
To facilitate the cases where the estimators are biased, we use the mean square error (MSE) in the 
definition of relative efficiency. 


Definition 5.5.3 An estimator 6, is more efficient than 6> if 
MSE6 < MSE6 


with strict inequality for some 0. Also, the relative efficiency of 6; with respect to 62 is 


e(41, oo) = = 


Example 5.5.4 
Let X1,..., Xn,n > 2 bearandom sample from a normal population with a true mean jz and variance o2. 
Consider the following two estimators of 07:6, = S*, and 6) = S’*. Find e(41, 4). 


Solution ' : 
Because woes ~ x2 (n— 1), E(S2) = 02, and MSE(S2) = Var(S2). Also, 2(n — 1) = Var(@-9=") = 


_4\2 
“=” Var(S?). 
Thus, 


Also, it can be shown that 
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Thus, the relative efficiency of 64 with respect to 62 is 


MSE(62) | MSE(S’) 


6 6 => x = 
ey MSE(6,;) -MSE(S?) 
_ On} 5? _ Qn-D)H- 


For n > 2, it can be seen that e(41, 69) < 1. Hence, S’2 is relatively more efficient than S2. 
[i= 


We have seen that it is possible that one unbiased estimator is more efficient than another. This 
leads to the possibility of having one unbiased estimator more efficient than all the other unbiased 
estimators. This directs us to the following definition. 


Definition 5.5.4 An unbiased estimator 49, is said to be a uniformly minimum variance unbiased 
estimator (UMVUE) for the parameter 6 if, for any other unbiased estimator 0 

Var (40) < Var(6), 
for all possible values of 0. 


It is not always easy to find an UMVUE for a parameter. However, the following result gives a lower 
bound for the variance of any unbiased estimator. 


CRAMER-RAO INEQUALITY 


Theorem 5.5.3 Let X1,..., Xn be a random sample from a population with pdf (or pf) f(x) that depends 
on a parameter 0. If 6 is an unbiased estimator of 0, then, under very general conditions, the following 
inequality is true: 


1 


ne| (2)*] , 


Var (6) > 


If @ is an unbiased estimator of W(), then 


Cal 


Vee) = ey 
nE| In fo(x)| 
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If L(@) is the likelihood function, we can rewrite the Cramér-Rao inequality in the form 


1 
| (2H) | 


From the Cramér—-Rao inequality, we can obtain the following result. 


Var(6) > 


EFFICIENT ESTIMATOR 
Theorem 5.5.4 If 6 is an unbiased estimator of 6 and if 


1 
ne] (2H) ] , 


then 6 is a uniformly minimum variance unbiased estimator (UMVUE) of 6. Sometimes @ is also referred to as 
an efficient estimator. 


Var (8) = 


Note that if the function f(.) is sufficiently smooth, it can be shown that 


dln fo(x)\? _ (8 In folx)\ _ 
e( 4 )- o( 52 = Var[In fo(x)]. 


Hence, the Cramér-Rao inequality in this case can be rewritten as 


Var (0) > s : 


7 —nE(ER fo) ~ nVar| a In fo(x)| 


Now, we will give a procedure to apply the Cramér—Rao inequality. 


CRAMER-RAO PROCEDURE TO TEST FOR EFFICIENCY 
2 
1. For the pdf (or pf), find aint andl reo 


2 
2. Calculate (1/n) E [-“22] if f (x) is smooth, or else calculate [ne [ (252) ll. 


3. Calculate Var (6). 
4. If the result of step 2 is equal to the result of step 3, then, 0 is efficient for 0. 


———— ne eee 
Example 5.5.5 
Let X,,..., X, bea random sample from an N (i, 0%) population with density function f(x). Show that X 
is an efficient estimator for ju. 
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Solution 
To calculate the Cramér—Rao lower bound, we have 


(x — 1)? 


Inf) =c- oe 


’ 


where c is a constant not involving ww. Then 
dln f(x) _ X74 


Ou o2 
and 
a2 In f(x) 1 
a2 si 
or 
1 1 o2 


= 7 = Var(X) : 
ne] 4 are | ne( 2) n 
062 o 
Therefore, X is an efficient estimator of j. That is, X is an UMVUE of ju. 
= 


EEO EEO 
Example 5.5.6 
Suppose p(x) is the Poisson distribution with parameter 4. Show that the sample mean X,, is an efficient 
estimator for i. 


Solution 

=k 
Here the density function is given by p(x) = eae Taking logarithms, 
In px) = xIna — A — In@)) 


alnp(x) x i 
mw A! 
and 
a? In p(x) x 
az 2 
Therefore, using the fact that the expected value of a Poisson r.v. is A, 


: = : =* = yr(x). 


vf BE] lB)" 


2 
Hence, X is an efficient estimator of x. 
= 


ee 


Example 5.5.7 
Let X1,..., Xn be a random sample from a Bernoulli trial with probability of success p. Show that the 
maximum likelihood estimator is also an efficient estimator. 
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Solution 

Note that the MLE of p is p = (1/n) )~"_, X; = X/n, the fraction of successes in the total number of trials, 
n. Because we can view n Bernoulli trials as being a single observation from a binomial distribution with 
parameters n and p, the likelihood function is 


L(p) = @ p*(1— p)*. 
Xx 


Then, 
In L(p) =1n ( +xInp+(n—x)In(1 — p). 
Now 
dInL(p) x m-x  x—np 
dp pp 1-p- p=p) 
Hence, 
Ff es) =f ( x —np : 
ap pp) 
- Var (x) 
[pl — p)|? 


_ np(1 — p) n 
[pa — p)P P= p) 


Therefore, the Cramér- Rao bound is 


1 _ PU—Pp) 
aln L(p)\2 no 
aL (ey | 
Now 


Vorb) = var( =) 


n 


a 
=z V 
“2 ar (x) 


1 p(1 — p) 

5 np(l— p)= F 
n n 
Because the variance of the estimator is equal to the Cramér-Rao lower bound, we conclude that p = es is 
an efficient estimator of p. 


It is important to note that an UMVUE may not exist for a given problem. Even when an UMVUE 
exists, it is not necessary that it have a variance equal to the Cramér—Rao lower bound. The term 
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2 
I(0) =E (2382) is called the Fisher information. In fact, for a random sample of size n with 


2 
likelihood function L(@), the Fisher information is defined as J,(0) = E | (2392) | It can be 


shown that the Fisher information in a sample of size n is n times the Fisher information in one 
observation. That is, J, (6) = nI(6). 


5.5.3 Minimal Sufficiency and Minimum-Variance Unbiased Estimation 


In the study of statistics, it is desirable to reduce the data contained in the sample as much as possible 
without losing relevant information. Our objective is to find minimal sufficient statistics and use 
them to develop uniformly minimum variance unbiased estimators (UMVUEs) for true parameters. 
Whenever sufficient statistics exist, then a statistician with those summary measures is as well off as 
the statistician with the entire sample, for point estimation purposes. Minimal sufficient statistics are 
those that are sufficient for the parameters and are functions of every other set of sufficient statistics 
for those same parameters. 


Definition 5.5.5 A sufficient statistic T(X) is called a minimal sufficient statistic if for any other statistic 
T'(X), T(X) is a function of T’(X). That is, 


T(X) =e (TD). 


Using this definition, it is difficult to determine whether a set of statistics is, in fact, minimal sufficient. 
Now we will present a method due to Lehmann and Scheffé that will be of great help in finding a 
minimal sufficient statistic. 


We can summarize the Lehmann and Scheffé method to find a minimal sufficient statistic as fol- 
lows. Let X1,..., X, be arandom sample with pdf or pmf f(x) that depends on a parameter 0. Let 
(x1,...,%,) and (y1,..., yn) be two different sets of values of (X1,..., X,). Let 


L (6; x1,..+,Xn) 
L (6; y1,-+-+5 Yn) 


be the ratio of the likelihoods evaluated at these two points. Suppose it is possible to find a 
function g(x1,...,%n) such that this ratio will be free of the unknown parameter 6 if and only if 
8(X1,---,Xn) = g(1,---; Yn). If such a function g can be found, then g(X,..., Xn) is a minimal 
sufficient statistic for 0. 


———_—_——__ $a 
Example 5.5.8 
Let X1,...,Xn be a random sample from the Bernoulli distribution where P(X;=1) = p and 
P (X; = 0) = 1 — p, with p unknown. Find a minimal sufficient statistic for p. 
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Solution 
The ratio of the likelihoods is 


Lt -+-1%n) _ PO ---tn) _— pe (1 = py 2h 
L(y1s-0+5 Yn) POs +++ Yn) phi (1 — py“ 


-( Pp yo" 
Sa eee : 


This ratio is to be independent of p, if and only if 


n n 
Yw-Enao 
i=1 i=1 


which implies 
n n 

a= dy 

i=1 i=1 
Therefore, 

n 
20 Sueeeee eb Bef 
i=1 


is a minimal sufficient statistic for p. 
= 


2. 
Example 5.5.9 


Let X1,..., X, be arandom sample from a U(0, 9) distribution. Find a minimal sufficient statistic for 6. 
Solution 
The likelihood function is 
u if ( )<9@ 
Lael if max(x1,...,%n) < 
0, otherwise. 
Denote by xmax = Max(x1,...,Xn), Nd ymax = Max(y1,..., Yn). Then, the ratio of the likelihood 
functions is 
1 if max (xmax, Ymax) < 9, 

TAR eyed ; 5 

gol el = 0, if ymax < Xmax, and ymax < 0 < Xmax:; 

L(y1,---, Yn) 

undefined, elsewhere. 


Thus, the ratio will not depend on @ if and only if xmax = ymax. Therefore, a minimal sufficient statistic for 
0 is X(n), the largest order statistic. 
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It is important to note that although we often can find a single statistic that is minimal sufficient for 
one parameter, this need not be the case (see Exercise 5.5.1). For most of the density functions that we 
consider, any unbiased estimator that is a function of a minimal sufficient statistic will be a uniformly 
minimum variance unbiased estimator (UMVUE), that is, it will posses the smallest variance possible 
among unbiased estimators. 


——““_—_ ee IQ Qqyu07———VOSESoOOOe— 
Example 5.5.10 
Let X1,..., X, be arandom sample from the normal distribution with known mean jz = 4g and unknown 
variance o*. Show that )-"_, (X; — 49)? is the minimal sufficient statistic for 07. Use this statistic to find 
an MVUE of o?. 


Solution 
The ratio of the likelihoods is 


L(x1,...,Xn) _ &P [-> Gi — w0)*/207] 
L(yi,--+5Yn) — exp[—0 Oj — Ho)? /207] 


7 exp| ss bs (vi — Ho)* — OG - vo?"}] 


In order for this ratio to be free of a2, we need 


>> Oi — Ho)” = YS Gi — Mo). 


Hence, >> (Xj — 9)2 is minimal sufficient for o?. 
Because E(X; — uo)? = o2, we can see that (1/n) 0 (Xi - po)? is an unbiased estimator of o2. Because 
this is a function of a minimal sufficient statistic, (1/n) 7", (Xi — 19)? is an MVUE of o2. 

= 


EXERCISES 5.5 


5.5.1. Show that the maximum likelihood estimator for p, Y,/n in a binomial distribution is 
consistent. 


5.5.2. Show that Y,, the nth-order statistic from a U(0, @) distribution, is a consistent estimator for 
6. 


5.5.3. Let X;,...,X, be a random sample with EX; = w;, EX? = w, and EX? = w/, all finite. 
Show that S* = (1/n) )“_, (X; — X)? is a consistent estimator of 0? = Var(X;). 


5.5.4. Let X),...,X, be arandom sample from a population with pdf 


ax*-l, for0<x<la>0 
0, otherwise. 


f@%) = 
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Is the method of moments estimator for a consistent? 


5.5.5. Let X1,..., X, bearandom sample from an exponential population with parameter 6. Show 
that X is a consistent estimator of 6. 


5.5.6. Let X1,...,X, and Y),...,Y, be independent random samples from populations with 
means 4; and j12 variances o7 and o3, respectively. Show that the difference X — Y is a 
consistent estimator of 41 — [2. 


5.5.7. Let X;,...,X, be arandom sample from a population with pdf 
1,(i-a@)/a : 
fe ax ; EO as Na 
0, otherwise. 
(a) Show that the maximum likelihood estimator of @ is & = — (1/n) )>y_, In Xj. 


(b) Is @ of part (a) an unbiased estimator of a? 
(c) Is & of part (a) a consistent estimator of a? 


5.5.8. Let X;,..., X, be arandom sample from a Rayleigh distribution with pdf 


x 


fo= |" 


x? /(2a) for x >0 


0, otherwise. 


e 


(a) Determine the maximum likelihood estimator &@ of a. 
(b) Is @ of part (a) an unbiased estimator of a? 
(c) Is @ of part (a) a consistent estimator of a? 


5.5.9. Let X1,..., Xn bearandom sample from the uniform distribution on the interval (6, 6 + 1). 
Let 


where Xm) is the nth order statistic. Find the efficiency of 6) relative to 6. 


5.5.10. Let X1,...,X, be a random sample from an N(w, 07) population. Let 6, be the sample 
mean and 6) be the sample median. It is known that Var(62) = (1.2533)2(o2/n). Find the 
efficiency of 6> relative to 6). 


5.5.11. Let X,,..., X, bearandom sample from an exponential population with parameter 6. Show 
that X is efficient for 6. 


5.5.12. Let X1,...,X» be arandom sample from an N(1, 07) population. Show that 
2(n—1) 
J2\ __ 4 
MSE(S'*) = 72 ; 


where S$’? = (1/n) \7_, (Xi — Xx). 
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5.5.13. Prove 


e| (F2) | _ e|(= ar)]. 
a0 a2 


making suitable assumptions. 
5.5.14, Let Xi,..., X, bea random sample from an N(, o*) population. 


(a) Show that the sample variance S* is an UMVUE for o? when the value of yw is not 
known. 


(b) Show that the variance of S? is greater than the Cramér-Rao lower bound. 


5.5.15. Let X1,...,X, be arandom sample from a U(0, 9) distribution. Let X(,) be the nth order 
statistic. 


(a) Show that 61 =X n), O02=2X, and 6,=2t1 x n) are unbiased estimators of 0. 
(n) n (n) 

(b) Find the efficiency of 6; relative to >. 

(c) Find the efficiency of 62 relative to 63. 


5.5.16. Let X1,..., Xn, (n => 2) bea random sample from a distribution with pdf 


f@= aeay —0O0<x<0&, —0O <4<oO. 


Show that the Cramér—Rao lower bound for a UBE of @ is 2/n. 


5.5.17. Let X1,..., Xn, > 4, bearandom sample from a population with a mean yw and variance 
o*. Consider the following three estimators of ju: 


7 1 

Oy at 22 + Aa XA), 

bie pe + (X3+...4+X, eae 

ae 1 5 1] 5(n 3) 3 tee n—-1 5 ns 
and 63 = X. 


(a) Show that each of the three estimators is unbiased. 
(b) Find e(O>, 61), e(3, 61), and e(63, 62). 


5.5.18. Find the Cramér—Rao lower bound for the variance of an unbiased estimator of 0, based on 
a sample of size n for the following pdfs: 
(i) £0) = pre /*, x>0, O>0. 
(ii) f(x, 6) =0x9!, O<x<1, O>0. 


5.5.19. Let Y},...,¥, be a random sample from the uniform distribution over the interval 
(@ — 1,6+ 1). Show that the order statistics X; = min(Y;) and X, = max(Y¥;) are jointly 
sufficient for 9. Also, show that X; and X, are jointly minimal for 0. 
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5.5.20. 


5.5.21. 


5.5.22. 


5.5.23. 


5.5.24. 


5.5.25. 


5.5.26. 


5.5.27. 


Let X1,..., X» be a random sample from a normal distribution with unknown mean 
and known variance o”. Find the maximum likelihood estimator of 4 and show that it is a 
function of a minimal sufficient statistic. 


Let X;,..., X, bearandom sample from a normal distribution with unknown mean yp and 
unknown variance o”. Show that }77_, X; and )~_, X? are jointly minimal sufficient for yu 
and o7. Also show that X and S* are UMVUEs for jz and o?. 


Let X;,..., X, be arandom sample from the Weibull density 
2x 2 
= er 1% x>0 
ro={ G) 
0, otherwise. 
Find an UMVUE for a. 
Let X1,..., Xn be a random sample from a Poisson distribution with parameter A. Find a 


minimal sufficient statistic for A. 


Let X,..., X, be arandom sample from a gamma distribution with parameters a and £, 
both unknown. Find minimal sufficient statistics for the parameters a and £. 


Let X;,..., X, be arandom sample from a distribution with density function 
eB x> 
fa)= | 0, ee 
Find an UMVUE for B. 
Let X;,..., X, be arandom sample from the exponential distribution with pdf 
f@) = [Po oe 
0, otherwise. 


Show that X is an UMVUE for £. Also show that (4) X is an MVUE for ?. 
Let X1,..., X, be arandom sample from a Rayleigh distribution with pdf 
2x 6—x7/B 
er Be ; x>0 


0, otherwise. 


Find an UMVUE for B. 


5.6 CHAPTER SUMMARY 


In this chapter we have discussed the basic concepts of point estimation. Two methods of finding point 
estimators were described—the method of moments and the method of maximum likelihood. We 
have seen that the maximum likelihood estimators possess the invariance property, which states that if 
6 is a maximum likelihood estimator of the parameter 0, then h (6) isa maximum likelihood estimator 
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for h(6). Some desirable properties of the point estimators that we have discussed are unbiasedness, 
consistency, efficiency, and sufficiency. Unbiasedness means that the expected value of the sample 
statistic (the mean of its probability distribution) should be equal to the parameter. Unbiasedness 
guards against consistently producing under- or overestimates of the parameter in repeated sampling. 
If the estimator is consistent, then, as the sample size increases, the estimator can be expected to get 
closer and closer to the population parameter. Efficient estimators have the lowest variance among 
all other estimators. A sufficient estimator is a “good” estimator of the population parameter 6 in the 
sense that it depends on fewer data values. 


We will now list some of the key definitions introduced in this chapter. 


Method of moments 

Likelihood function 

Maximum likelihood equations 
Unbiased estimator 

Mean square error 

Minimum variance unbiased estimator 
Consistent estimator 

Efficiency 

Uniformly minimum variance unbiased estimator 
Efficient estimator 

Sufficient estimator 

Jointly sufficient 

Minimal sufficient statistic 


In this chapter, we have also learned the following important concepts and procedures. 


The method of moments procedure 
Procedure to find MLE 

Procedure to test for consistency 

Procedure to test relative efficiency 
Cramér-Rao procedure to test for efficiency 
Procedure to verify sufficiency 


5.7 COMPUTER EXAMPLES 


Because in the earlier chapters we have already given steps to obtain summary statistics such as the 
mean and variance using SPSS and SAS, we could use those commands to obtain point estimates as 
we will do with Minitab. Therefore, we will not give separate subsections for SPSS and SAS procedures. 
The following examples illustrate Minitab procedures. 


en eee Ee 
Example 5.7.1 
Generate 50 sample points from an N(4, 4) distribution and find the descriptive statistics. Obtain an 
unbiased and sufficient estimate of i. 
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Solution 
Because we know that the sample mean x is an unbiased and sufficient estimate of the population mean 1, 
we only need to find the sample mean of the generated data. 


Calc > Random Data > Normal ... > Type 50 in Generate __ rows of data > Store in column(s): 
type C1 
> type in Mean: 4.0 and in Standard deviation: 2.0 > click OK. 


The following is one possible output. 


Cll 

4.76039 5.07819 4.85263 4.08032 6.77772 4.21677 51811 
5.16925 3.68845 6.40513 6.13801 7.20015 2.41415 3.50008 
3.25593 2.66181 1.01352 5.82506 6.04212 5.22235 5.29924 
2.80955 4.19032 4.65449 3.48680 6.39083 6.56357 = 1.32281 
2.43494 2.01465 4.02358 8.22997 2.44516 0.39563 3.78948 
1.76723 3.15460 4.81882 0.36250 0.85002 14.47052 0.79586 
2.86329 5.97599 7.75170 7.10011 6.61681 0.97982 4.01400 
5.38503 


Now follow the procedure to obtain the descriptive statistics from Example 1.8.3 to obtain 


Descriptive Statistics 


Variable N Mean Median | TrMean |StDev |SE Mean 
Cl 50 4.116 4.135 | 4.115 |2.047] 0.289 
Variable |Minimum |Maximum Ql Q3 
Cl 0.362 8.230 2.443 | 5.863 


We can see that the unbiased and sufficient estimate of the mean ju for these data is ¥ = 4.116. 


2 
Example 5.7.2 
Generate 35 samples from a U(0, 5) distribution and using the descriptive statistics command, find the 
maximum likelihood estimate for this data. 


Solution 
We know that for a random sample Xj,..., Xn from U(0, 6), the MLE, 6= max(X;)= Xn), the nth order 
statistic. We can use the following steps to obtain the estimate. 


Calc > Random Data > Uniform. .. > Type 35 in Generate __ rows of data > Store in column(s): 
type C7 > type in Lower end point: 0.0 and in Upper end point: 5.0 > click OK 


One possible output is given below. 


Cl 

4.32848 
0.07934 
2.92537 
4.20844 
3.25272 


4.79402 
3.12453 
2.39721 
3.75506 
4.61083 


0.34515 
1.69073 
4.84440 
4.56626 
3.06527 


0.08428 
3.44003 
1.79129 
3.50280 
2.34003 


1.93000 
0.47447 
4.38718 
1.95689 
0.40877 


0.27878 
2.28072 
3.60697 
0.56969 
2.52708 


3.12992 
0.49205 
0.94159 
1.02543 
1.44525 
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Now follow the procedure to obtain the descriptive statistics from Example 1.8.3 to obtain 


Descriptive Statistics 


Variable N Mean Median TrMean StDev SE Mean 
Cl 35 2.417 2.397 2.413 1.541 0.260 
Variable Minimum Maximum (Q1 Q3 
Cl 0.079 4.844 0.942 3.607 


Therefore the MLE 6 = 4.884. 
nee 


For the previous example, it should be noted that because we are generating random data, each time 
we follow this procedure, we will be getting different answers. When we have a particular data set, 
enter the data in C1 and just use the procedure to find the descriptive statistics. For other distributions, 
click the appropriate distribution in Random Data. 


PROJECTS FOR CHAPTER 5 
5A. Asymptotic Properties 


In general, we do not have a single sample with one estimator of the unknown parameter 0. Rather, 
we will have a general formula that defines an estimator for any sample size. This gives a sequence of 
estimators of 6: 


6 = hy (X),...,Xn), n=1,2,..... 


In this case, we can define the following asymptotic properties: 
(i) The sequence of estimators 6, is said to be asymptotically unbiased for 6 if bias(6,) —> 0as 
n>o. 
(ii) Suppose 6, and }, are two sequences of estimators that are asymptotically unbiased for 0. 
The asymptotic relative efficiency of 4, to }, is defined by 
Var(6n) 
n Var (Sin) ; 


(a) Show that 6, is asymptotically unbiased if and only if 


E(8n) > 6 asin > ov. 


286 CHAPTERS Point Estimation 


(b) Let X1,..., X, bea random sample from a distribution with unknown mean yp and variance 
o?. It is known that the method of moments estimators for 4 and o? are, respectively, the 
sample mean X and S’* = (1/n) Yen (i x) = ((n — 1)/n) S2, where S? is the sample 
variance. 

(i) Show that g is an asymptotically unbiased estimator of o?. 
(ii) Show that the asymptotic relative efficiency of S’ . to. 5. is 1, 
(iii) Show that MSE(S'7) < MSE (S2). Thus, S? is unbiased but S’* has a smaller mean 
square error. However, it should be noted that the difference is very small and approaches 
zero as n becomes large. 


5B. Robust Estimation 


The estimators derived in this chapter are for particular parameters of a presumed underlying family 
of distributions. However, if the choice of the underlying family of distributions is based on past 
experience, there is a possibility that the true population will be slightly different from the model 
used to derive the estimators. Formally, a statistical procedure is robust if its behavior is relatively 
insensitive to deviations from the assumptions on which it is based. If the behavior of an estimator 
is taken as its variance, a given estimator may have minimum variance for the distribution used, but 
it may not be very good for the actual distribution. Hence, it is desirable for the derived estimators 
to have small variance over a range of distributions. We call such estimators robust estimators. The 
following illustrates how the variance of an estimator can be affected by deviations from the presumed 
underlying population model. 


Consider estimating the mean of a standard normal distribution. Let X;,..., X, be a random sam- 
ple from a standard normal distribution. Suppose the population actually follows a contaminated 
normal distribution. That is, for 0 < 6 < 1, 100 (1 — 6) % of the observations come from an N(0, 1) 
distribution and the remaining 1006% of observations come from an N(0, 5) distribution. We already 
know that the minimum variance unbiased estimator of the mean yp of an uncontaminated normal 
distribution is the sample mean. A less effective alternative would be the sample median. 


(a) Conduct a simulation study with sample size n that takes, say, 5000 random samples of 100 
observations each. Find the mean and median. Also find the sample variance of each. For 
various values of 6, say 0.0, 0.01, 0.05, 0.1, 0.2, 0.3, and 0.4, create a table of variances of 
sample mean and sample variance. Compare the variances as the value of 6 increases. 

(b) The aim of robust estimation is to derive estimators with variance near that of the sample 
mean when the distribution is standard normal while having the variance remain relatively 
stable as 6 increases. One such estimator is the w — trimmedmean. Let 0 < a < 0.5, and define 
k = [na], where [x] is the greatest integer that is less than or equal to x. For the ordered 
sample, discard the k highest and lowest observations and find the mean of the remaining 
n — k observations. That is, let X(1) < X(2) < ... < Xin) be the ordered sample, and define 


ya Xa+H) 5 Xe+) S++ 5 Xan 
. n— 2k , 


For the values of 5 and the samples in part (a), compute the mean and the 0.05-, 0.1-, 0.25-, and 
0.5-trimmed means. Discuss the robustness. 
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5C. Numerical Unbiasedness and Consistency 
(a) Run the simulation of a normal experiment with increasing sample size. Numerically show 
the unbiased and consistent properties of the sample mean. Run the experiment at least up 
until n = 1000. 
(b) Repeat the experiment of part (a), now with an exponential distribution. 


5D. Averaged Squared Errors (ASEs) 


Generate 25 samples of size 40 from a normal population with » = 10, and o? = 4. For each of the 


25 samples: 
40 40 40 
Y @i-*)? Y Oi-x)?  Oi-x)? 
(a) Compute: X, s? = = .,—, sj} = = 47 —_ and s§ = |. 


(b) Compute the average squared error (ASE) for each of the estimates s*, st, s5 as follows. 


K 
Let K® = [2 (xj; — »| /39] for K =1,2,...,25; and K™ be the sample variance for the 

i=1 
Kth sample. Then, the average squared error is 


Repeat this procedure for the other two estimators. Compare the three ASEs and check which 
has the least ASE. 
(c) Repeat (a) and (b) with a sample size of 15. 


5E. Alternate Method of Estimating the Mean and Variance 


(a) Consider the following alternative method of estimating j. and o?. We sample sequentially, 
and at each stage we compute the estimates of y and o? as follows. 


Let X1,..., Xn, Xn+1 be the sample values. 
Compute 

n n+1 n 2 

»~ Xj ~ i a (x; = Xn) 

% = = 1 = SF = , and 
7 n aie n+1?"” n—-1 
n+1 
2 
(Xi — Xn) 
2 i=1 
Sr = 


The sequential procedure is stopped when 


Ss? - S| < 0.01. 


This will also determine the sample size. 
(b) Compare the sample sizes and estimates in 5D and 5E. 
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5F. Newton-Raphson in One Dimension 


For a given function g(x), suppose we need to solve g(@) = 0. Using the first-order Taylor expansion, 
2(0) © g(x) + (6 — x)g’(x), where g’(x) = ES and setting ¢(6) = 0, we get 0 © x — ae . Thus, starting 
with an initial guess solution x, the guess is updated by @ using the previous formula. This derivation 
is the basis for the Newton-Raphson iterative method for obtaining the solution of g(6) = 0. This is 


given by 


8 (On) 
O(n) = on — 76,3" n= 0, 


where @, is the value of @ at the nth iteration, starting with the initial guess, 69. For a good approxi- 
mation of the solution, the choice of 69 is important. The convergence of this algorithm cannot be 
guaranteed. 


For the MLE, we want to find a solution of 


dL 


g() = ae = 0, 


where L = L (8) is the likelihood function of the random sample X,..., X,. An iterative algorithm 
for finding the MLE can be given by 


Pin41) = 9 — 
Write a computer program to find the MLE of a for a gamma distribution with parameters a and £. 


5G. The Empirical Distribution Function 


The estimators in this chapter yield a single real value (point estimate) for each parameter. In Chapter 
6, we will learn about so-called interval estimates. In this project, we use an estimation procedure that 
estimates the whole distribution function, F, of a random variable X. We now define the empirical 
distribution. 


The empirical distribution function for arandom sample X,,..., X, froma distribution F is the function 
defined by 


Fy (x) = “#(i 1<i<n: Xj <x}. 
It can be shown that nF, (x) is a binomial random variable with 
E[Fx(x)] = F(x) and Var [ Fy(x)| = “F(3) [1 — F(x)]. 
Also, by the strong law of large numbers, for each real number x, 


im, Fn(x) = F(x) with probability 1. 
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One of the tests to determine whether a random sample comes from a specific distribution is the 
Kolmogorov—-Smirnov (K-S) test. The K-S test is based on the maximum distance between the empirical 
distribution function and the actual cumulative distribution function of this specific distribution 
(such as, say, the normal distribution). 

Using the method of Project 4A (or using any statistical software), generate 100 sample points from 
anormal distribution with mean 2 and variance 9. Graph the empirical distribution function for this 
sample. Compare this graph with the graph of the N(2, 9) distribution. 
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Karl Pearson 


(Source: http://www-history.mcs.st-and.ac.uk/~ history/PictDisplay/Pearson. html) 


Karl Pearson (1857-1936) is considered the founder of the 20th-century science of statistics. 
Pearson has contributed in several different fields such as anthropology, biometry, eugenics, scientific 
method, and statistical theory. He applied statistics to biological problems of heredity and evolution. 
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He is the author of The Grammar of Science, the three volumes of The Life, Letters and Labors of Francis 
Galton, and The Ethic of Free Thought. Pearson was the founder of the statistical journal Biometrika. 
In 1900, he published a paper on the chi-square goodness of fit test. This is one of Pearson’s most 
significant contributions to statistics. In 1893, Pearson coined the term “standard deviation.” 


6.1 INTRODUCTION 


In the previous chapter, we studied methods for finding point estimators for the population parame- 
ters. In general the estimates will differ from the true parameter values by varying amounts depending 
on the sample values obtained. In addition, the point estimates do not convey any measure of 
reliability. 


In this chapter, we discuss another type of estimation, called an interval estimation. Although point 
estimators are useful, interval estimators convey more information about the data that are used to 
obtain the point estimate. The purpose of using an interval estimator is to have some degree of 
confidence of securing the true parameter. For an interval estimator of a single parameter 6, we will 
use the random sample to find two quantities L and U such that L < 6 < U with some probability. 
Because L and U depend on the sample values, they will be random. This interval (L, U) should 
have two properties: (1) P(L < 6 < U) is high, that is, the true parameter 0 is in (L, U) with high 
probability, and (2) the length of the interval (L, U) should be relatively narrow on the average. 


In summary, interval estimation goes a step beyond point estimation by providing, in addition to 
the estimating interval (L, U), a measure of one’s confidence in the accuracy of the estimate. Interval 
estimators are called confidence intervals and the limits are called U and L, the upper and lower confidence 
limits, respectively. The associated levels of confidence are determined by specified probabilities. The 
width of the confidence interval reflects the amount of variability inherent in the point estimate. 
Thus, our objective is to find a narrow interval with high probability of enclosing the true parameter, 
0. We will restrict our attention to single parameter estimation. 


The probability that a confidence interval will contain the true parameter @ is called the confidence 
coefficient. The confidence coefficient gives the fraction of the time that the constructed interval will 
contain the true parameter, under repeated sampling. 


Let L and U be the lower and upper confidence limits for a parameter 0 based on a random sample 
X,,..., X». Both L and U are functions of the sample. We can write the interval estimate of 6 as 


P(L<@<U)=1-a 


and we read it as we are (1 — w)100% confident that the true parameter 6 is located in the interval 
(L, U). The number 1 — a is the confidence coefficient, and the interval (L, U) is referred to as a 
(1 — a) 100% confidence interval ((1 — a)100% CI) for 6. Thus, if we want a 95% confidence interval 
for, say, population mean jz, then a = 0.05. Note that for the discrete random variables, we may not 
be able to find a lower bound L and an upper bound U such that the probability, P(L < 6 < U), 
is exactly (1 — a). In such a case we can choose L and U such that P(L <9 <U)>1-a. 

How do we find the confidence interval? For this, we use the error structure of the point estimator 


to obtain this interval. For instance, we know that the sample mean, X, is a point estimate (MLE or 
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unbiased estimator) of the population mean jz. In this case, we know that the standard error of X is 
o/./n. If the sample came from a normal population, then for a 95% confidence interval for the mean, 
multiply the standard error by 1.96 and then add and subtract this product from the sample mean. 
From this we can also observe that, if everything else remains the same, the size of the confidence 
interval reduces as the sample size increases. 


———__—_”) ’_'===I==z=x=x=——_—_—_a"r_ ese 
Example 6.1.1 
As part of a promotion, the management of a large health club wants to estimate average weight loss for its 
members within the first 3 months after joining the club. They took a random sample of 45 members of this 
health club and found that they lost an average of 13.8 pounds within the first 3 months of membership 
with a sample standard deviation of 4.2 pounds. Find a 95% confidence interval for the true mean. What if 
a random sample of 200 members of this health club also resulted in the same sample mean and sample 
standard deviation? 


Solution 

Here a point estimate of the true mean u is the sample mean X= 13.8 pounds. Because n=45 is large 
enough, we can use the Central Limit Theorem and use approximate normality for the distribution of X 
with mean w and the approximate standard error (4.2//45) =0.626. Thus a 95% confidence interval is 
13.8 + (1.96)(0.626), resulting in the interval (12.57, 15.03). Thus, on average, with 95% confidence, one 
can expect the true mean to lie in this interval. 


For n = 200, the standard error is (4.2/./200) ~ 0.297. Thus a 95% confidence interval is 13.8 + 
(1.96)(0.297) resulting in the interval (13.22, 14.38). Thus the more sample values (that is, the more 
information) we have, the tighter (smaller width) the interval. 


The previous example was built on our knowledge of the sampling distribution of the sample mean. What 
if the sampling distribution of the statistic we are interested in is not readily available? More generally, our 
success in building confidence intervals for an estimate of a parameter depends on identifying a quantity 
known as the pivot. We now describe this method. 

| 


6.1.1 A Method of Finding the Confidence Interval: Pivotal Method 


The pivotal method is a general method of constructing a confidence interval using a pivotal quantity. 
This relies on our knowledge of sampling distributions. Here we have to find a pivotal quantity with 
the following two characteristics: 


(i) Itisa function of the random sample (a statistic or an estimator @) and the unknown parameter 
0, where @ is the only unknown quantity, and 
(ii) It has a probability distribution that does not depend on the parameter 6. 


From (i) and (ii), it is important to note that the pivotal quantity depends on the parameter, but 
its distribution is independent of the parameter. Let X;,..., X, be a random sample and let Obea 
reasonable point estimate of 0. For instance, @ could be the maximum likelihood (or some other) 
estimator of 6. In general, finding a pivotal quantity may not be easy. However, if 6 is the sample 
mean X or sample variance S?, we could find a pivotal quantity with known sampling distributions. 
Suppose p (6, 6) is a pivotal quantity with known probability distribution that is independent of 0. 
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(Usually, the probability distribution of the pivotal quantity will be standard normal, t, x7, or 
F-distribution.) The following are some of the standard pivotal quantities: If the sample X1,..., Xn 
is from N(, 07) 


(i) With w unknown and o known, let X be the sample mean. Then the pivot is (X—;2)/(o/./n), 
which has an N(0, 1) distribution (see comments after Corollary 4.2.2). 
(ii) With » unknown and o unknown, then the pivot is (X — )/(S/./n), which has a t- 
distribution with (n — 1) degrees of freedom (see Theorem 4.2.9). If n is large, using CLT, 
the distribution of the pivot is approximately N(0, 1). 
(iii) If o? is unknown, then the pivot is (n — 1)S?/o7, which has a x?-distribution with (n — 1) 
degrees of freedom (see Theorem 4.2.8). 


For a given value of a, (0 < a < 1), and constants a and b, with (a < b), let 
P(a< p(6,0) <b)=1-a. 


Hence, given 6, the inequality is solved for 6 to obtain a region of @ values, usually an interval 
corresponding to the observed 6-value. The following examples illustrate the pivotal method. 


a env; vO eee _— om" 
Example 6.1.2 
Suppose we have a random sample X1,..., Xn from N(, 1). Construct a 95% confidence interval for ju. 


Solution 

Here the confidence coefficient is 0.95. We know that the maximum likelihood estimator of ys is X, which 
has an N(w, 1/n) distribution. Note that this distribution depends on the unknown value of «4, and hence 
X cannot be a pivot. However, taking the z-transform of X, we obtain the pivotal quantity as 


X-pw X-p 


~ ofyn fv 


which has an N(O, 1) distribution that is a function of the sample measurements and does not depend on wu. 


Hence, this Z can be taken as a pivot po, 6). Now to finda and b such that P(a < Z= p (6, 0) <b) = 0.95. 
One such choice is to find the value of a such that p(—a < Z < a) = 0.95. From the normal table, 


P(—Zq/2 < Z Ss Za/2) = 0.95, 
where Zw/2 represents the value of z with tail area a/2. This implies a = Zq/2 = 1.96. Hence, 
P(-1.96 < Z < 1.96) = 0.95 


or, using the definition of Z and solving for 4, we obtain 
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Hence, a 95% confidence interval for wu is (X — (1.96/./n), X + (1.96/./n)). Thus, the lower confidence 
limit L is X —(1.96/./n) and the upper confidence limit U is X + (1.96/,/n). 
|| 


From the derivation of Example 6.1.1, it follows that 


P(IR=w < aja) =1-a. 


Thus, for a normal population with known variance o7, if X is used as an estimator of the true mean 
u, the probability that the error will be less than zg/20/,/n is 1 — a. It is important to note that there 
is some arbitrariness in choosing a confidence interval for a given problem. There may be several 
pivotals for @ that could be used. Also, it is not necessary to allocate equal probability to the two tails 
of the distribution; however, doing so may result in the shortest length confidence interval for a given 
confidence coefficient. 


When we make the statement of the form 


P(x- = <w<X4+ ~*) = 0.95, 
we mean that, in an infinite series of trials in which repeated samples of size n are drawn from 
the same population and 95% confidence intervals for 4 are calculated by the same method for 
each of the samples, the proportion of intervals that actually include yw will be 0.95. Figure 6.1 
illustrates this idea, where the vertical line represents the position of true mean yw and each of 
the horizontal lines represents a 95% confidence interval of the sample, 20 samples of size n are 
taken. 


A statement of the type P(x — (1.96/,/n) < uw < ¥+ (1.96/,/n)) = 0.95, where x is the observed 
sample mean, is misleading. Once we calculate this interval using a particular sample, then either 
this interval contains the true mean « or not, and hence the probability will be either 0 or 1. Thus, 
the correct interpretation of confidence interval for the population mean is that if samples of the 
same size, n, are drawn repeatedly from a population, and a confidence interval is calculated from 
each sample, then 95% of these intervals should contain the population mean. This is often stated as 
“We are 95% confident that the true mean is in the interval (X — zy/2(0/./1), X + Za/2(0/./n)).” Thus, 


Wi FIGURE 6.1 95% confidence intervals for jw. 
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PDF of P 


Wi FIGURE 6.2 Probability density of the pivot. 


the correct interpretation requires the confidence limits to be variables. This concept of confidence 
interval is attributed to Neyman. 


We can follow the accompanying procedure to find a confidence interval for the parameter 6. 


PROCEDURE TO FIND A CONFIDENCE INTERVAL FOR 0@ USING THE PIVOT 

1. Find an estimator 6 of 6: usually MLE of 6 works. 

2. Find a function of @ and 6, p(@, 8) (pivot), such that the probability distribution of p(.,.) does not 
depend on 6. 

3. Find a and b such that P(a < p(0, 6) < b) = 1—a. Choose aand b such that P(p(6, 6)<a) = a/2 and 
P(p(6, 6) > b) = a/2 (see Figure 6.2 where the shaded area in each side is a/2). 

4. Now, transform the pivot confidence interval to a confidence interval for the parameter 0. That is, 
work with the inequality in step 3 and rewrite it as P(L < @ < U) = 1 —a, where LL is the lower 
confidence limit and U is the upper confidence limit. 


The following example is given to show that the success of finding a pivotal quantity depends on our 
ability to find the right transformation of the statistic and its distribution so that the transformed 
variable is a pivot. 


—qre——oosXxvlO ooo 
Example 6.1.3 
Suppose the random sample X1,..., X, has U(0, 4) distribution. Construct a 90% confidence interval for 
6 and interpret. Identify the upper and lower confidence limits. 


Solution 
From Example 5.3.4, we know that 


U = max X; 
1<i<n 
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is the MLE of @. The random variable U has the pdf 
fu(u) =nu""/6", OK<us<e. 


This is not independent of the parameter 6. Let Y = U/6, then (using the Jacobians described in Chapter 3) 
the pdf of Y is given by 


frgy=ny"!, O<yK<1. 


Hence, Y satisfies the two characteristics of the pivotal quantity. Thus, Y = U/@ is a pivot. Now, we have to 
find a and b such that 


To find a and b we use the cdf of Y, Fy(y) = y", 0 < y < 1, as follows. 


Fly) 7 


Fy(a)=0.05 and Fy(b) =0.95 
which implies that 


a’ =0.05 and b” =0.95 
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resulting in 
a=V0.05 and b= V0.95. 


Write 
n U n 
P{ V0.05 < a < 70.95 |] = 0.90. 


Solving, the 90% confidence interval for @ is 


or 


U U 
P| -—— <é< = 0.90. 
€ 0.95 V aa) 


Thus, the lower confidence limit is U/*/0.95 and the upper confidence limit is U/+/0.05, and the 90% 
confidence interval is (U//0.95, U/</0.05). 
[= 


We can interpret this in the following manner. In a large number of trials in which repeated samples 
are taken from a population with uniform pdf with parameter 0, approximately 90% of the intervals 
will contain 6. For instance, if we observed n = 20 values from a uniform distribution with the 
maximum observed value being 15, then a 90% confidence interval for 6 is (15.04, 17.42). Thus, we 
are 90% confident that these data came from a uniform distribution upper limit falling somewhere 
in this interval. 


It is important to note that the pivotal method may not be applicable in all situations. For example, 
in the binomial case, to find a confidence interval for p, there is no quantity that satisfies the two 
conditions of a pivot. However, if sample size is large, then the z-score of sample proportion can be 
used as a pivot with approximate standard normal distribution. For pivotal method to work, there 
is the practical necessity that the distribution of the pivotal quantity make it easy to compute the 
probabilities. In cases where the pivotal method does not work, we may need to use other techniques 
such as the method based on sampling distributions (see Project 4A). A proper discussion of these 
methods is beyond the level of this book. 


EXERCISES 6.1 


6.1.1. (a) Suppose we construct a 99% confidence interval. What are we 99% confident about? 
(b) Which of the confidence intervals is wider, 90% or 99%? 
(c) In computing a confidence interval, when do you use the t-distribution and when do 
you use z, with normal approximation? 
(d) How does the sample size affect the width of a confidence interval? 


6.1.2. Suppose X is a random sample of size n = 1 from a uniform distribution defined on the 
interval (0, 6). Construct a 98% confidence interval for 6 and interpret. 
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6.1.3. Consider the probability statement 


X-—p 


o/./n 


o(-201 <Z= < 273) =K 


where X is the mean of a random sample of size n from N(, 07) distribution with 


known o2. 


(a) Find k. 

(b) Use this statement to find a confidence interval for ju. 
(c) What is the confidence level of this confidence interval? 
(d) Find a symmetric confidence interval for ju. 


6.1.4. A random sample of size 50 from a particular brand of 16-ounce tea packets produced a 
mean weight of 15.65 ounces. Assume that the weights of these brands of tea packets are 
normally distributed with standard deviation of 0.59 ounce. Find a 95% confidence interval 
for the true mean p. 


6.1.5. Let X;,...,X, be arandom sample from an N(, 07), where the value of o* is unknown. 


(a) Construct a (1—«@)100% confidence interval for o* 


Interpret its meaning. 

(b) Suppose a random sample from a normal distribution gives the following summary 
statistics: n = 21, = 44.3, and s = 3.96. Using part (a), find a 90% confidence interval 
for o*. Interpret its meaning. 


, choosing an appropriate pivot. 


6.1.6. Let X1,...,X, be arandom sample from a gamma distribution with a = 2 and unknown 
B. Construct a 95% confidence interval for B. 


6.1.7. Let X1,..., X, be a random sample from an exponential distribution with pdf f(x) = 
(1/0)e—*/*, 6 > 0, x > O. Construct a 95% confidence interval for 6 and interpret. [Hint: 
Recall that 5°; ,X; has a gamma distribution with a =n, B = 6.] 


6.1.8. Let X),..., X, be arandom sample from a Poisson distribution with parameter A. 


(a) Construct a 90% confidence interval for A. 

(b) Suppose that the number of raisins in a bowl of a particular brand of cereal is observed 
to be 25. Assuming that the number of raisins in a bowl is Poisson distributed, estimate 
the expected number of raisins per bowl with a 90% confidence interval. 

(c) How many bowls of cereal need to be sampled in order to estimate the expected number 
of raisins per bowl with a standard error of less than 0.2? 


6.1.9. Let X),...,X, be arandom sample from an N(, 0%). 


(a) Construct a (1 — w) 100% confidence interval for 2 when the value of o? is known. 
(b) Construct a (1 — @) 100% confidence interval for 4 when the value of o? is unknown. 


6.1.10. Let X;,...,X, be a random sample from an N(1, a7) population and ¥j,..., Y, be an 
independent random sample from an N(2, 07) distribution where o? is assumed to be 
known. Construct a (1 — a) 100% interval for (1 — #2). Interpret its meaning. 
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6.1.11. Let X1,..., X, be arandom sample from a uniform distribution on [6, 6 + 1]. Find a 99% 
confidence interval for 6, using an appropriate pivot. 


6.2 LARGE SAMPLE CONFIDENCE INTERVALS: ONE SAMPLE CASE 


If the sample size is large, then by the Central Limit Theorem, certain sampling distributions can be 
assumed to be approximately normal. That is, if 6 is an unknown parameter (such as 2, p, (U1 — (2), 
(p1 — p2)), then for large samples, by the Central Limit Theorem, the z-transform 


6-6 


% 


= 


possesses an approximately standard normal distribution, where 6 is the MLE of @ and 04 is its standard 
deviation. Then as in Example 6.1.1, the pivotal method can be used to obtain the confidence interval 
for the parameter 6. For @ = fu, n > 30 will be considered large; for the binomial parameter p, n is 
considered large if np, and n(1 — p) are both greater than 5. 


PROCEDURE TO CALCULATE LARGE SAMPLE CONFIDENCE INTERVAL FOR 0 
1. Find an estimator (such as the MLE) of 6, say 0. 
. Obtain the standard error, op of 8. 
. Find the z-transform z = (6 — 6)/og. Then z has an approximately standard normal distribution. 
. Using the normal table, find two tail values —Zy/2 and Zy2. 


u FW N 


. Anapproximate (1 — «)100% confidence interval for @ is (6 = 2y/20%,0 + 24/20) that is, 


P( —Zq/2% < O0< 6 + zu1294) =1-a. 


6. Conclusion: We are (1 — «)100% confident that the true parameter 6 lies in the interval 
(4 = Ze/20%1 8 PF Zu20%}): 


Eee sss ————————————————<—— et 
Example 6.2.1 
Let 6 be a statistic that is normally distributed with mean 6 and standard deviation 03, where o is assumed 
to be known. Find a confidence interval for 6 that possesses a confidence coefficient equal to 1 — a. 


Solution 

The z-transform of 6 is 

_ 6-8 
= z 
and has a standard normal distribution. Select two tail values —Zq/2 aNd Zw/2 such that 


Z 


P(—Za/2 <Z< Za/2) =1-a. 
Because of symmetry, this is the shortest interval that contains the area 1 — a. Then, 


PO za/295 <0< 8 + 20/209) =l-a. 
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Therefore, the confidence limits of 6 are 6 — Z@/2%4 and 6+ Za/209. Hence, (1 — a) 100% confidence interval 
for @ is given by 6+ 20/20: 
= 


If in particular for a large sample of size n, let @ = X be the sample mean. Then the large sample 
(1 — a) 100% confidence interval for the population mean p is 


= o = S 
ceca es eee OT 


where S is a point estimate of o. That is, 


As we have seen in Section 6.1, the correct interpretation of this confidence interval is that in a repeated 
sampling, approximately (1 — a) 100% of all intervals of the form X + zy/2(S/,/n) include 1, the true 
mean. Suppose x and s are the sample mean and the sample standard deviation, respectively, for a 
particular set of n observed sample values x, ...,x,. Then we do not know whether the particular 
interval (© — Za/2(s//n), ¥ — Za/2(s/./n)) contains 4. However, the procedure that produced this 
interval does capture the true mean in approximately (1 — w)100% of cases. This interpretation will 
be assumed hereafter, when we make a statement such as, “We are 95% confident that the true mean 
will lie in the interval (74.1, 79.8).” 


OOOO: nn —  eES<_c0Oeeee 
Example 6.2.2 
Two statistics professors want to estimate average scores for an elementary statistics course that has two 
sections. Each professor teaches one section and each section has a large number of students. A random 
sample of 50 scores from each section produced the following results: 
(a) Section |: x; = 77.01, 51 = 10.32 
(b) Section Il: ¥2 = 72.22, 59 = 11.02 


Calculate 95% confidence intervals for each of these three samples. 
Solution 
Because n=50 is large, we could use normal approximation. For a=0.05, from the normal table: 


Za/2 = 20.025 = 1.96. The confidence intervals are: 
(a) We have 


= S] 10.32 


which gives a 95% confidence interval (74.149, 79.871). 
(b) We can compute 


_ 59 11.02 
X24 0/2 = 72.22+ 1.96 


/50 


which gives the interval (69.165, 75.275). 
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It may be noted that if the population is normal with a known variance o7, we can use X + Za/2(a/./n) 
as the confidence interval for the population mean jz, irrespective of the sample size. However, if o? is 
unknown, in order to use X + Z~/2(s/,/n) as an approximate confidence interval for jz, the sample size 
has to be large for the Central Limit Theorem to hold. However to use this approximate procedure, 
we do not need the condition that samples arise from a normal distribution. We will consider sample 
size to be large ifn > 30 (applicable to estimators of the mean). If not, we shall use the small sample 
procedure discussed in the next section. 


a Se 
Example 6.2.3 
Fifteen vehicles were observed at random for their speeds (in mph) on a highway with speed limit posted 
as 70 mph, and it was found that their average speed was 73.3 mph. Suppose that from past experience we 
can assume that vehicle speeds are normally distributed with o = 3.2. Construct a 90% confidence interval 
for the true mean speed jy, of the vehicles on this highway. Interpret the result. 


Solution 

Because the population is given to be normal with standard deviation o = 3.2, sample size need not be 
large given X = 73.3 and o = 3.2. Here, n = 15, and a = 0.10. Thus, zw/2 = 20.05 = 1.645. Hence, a 
90% confidence interval for wz is given by 


3.2 


V15 


3.2 
73.3 — 1.645 — < pw < 73.34 1.645 


V15 
or 
71.681 < ph < 74.919. 


Interpretation: We are 90% confident that the true mean speed yx of the vehicles on this highway is between 
71.681 and 74.919. 


6.2.1 Confidence Interval for Proportion, p 


Consider a binomial distribution with parameter p. Let X be the number ofsuccesses inn trials. Then the 
maximum likelihood estimator p of pis p = X/n. It can be shown, using the procedure outlined at the 
beginning of this section, that an approximate large sample (1 — w)100% confidence interval for p is 


‘ [PA—p) , [p= 2p 
(rn PP P+ 29/2 mn). 


That is, 


: |p — p) x | Pd — Pp) 
ora Pa <p <ptra/2 HEAP) 1a 


A natural question is: “How do we determine the sample size that we have is sufficient for the normal 
approximation that is used in the foregoing formula?” There are various rules of thumb that are used 
to determine the adequacy of the sample size for normal approximation. Some of the popular rules 
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are that np and n(1 — p) should be greater than 10, or that p + 2,/p(1 — p)/n should be contained 
in the interval (0, 1), or np(1 — p) > 10, etc. All of these rules perform poorly when p is nearer to 0 
or 1. Recently, there have been many works on coverage analysis for confidence intervals. We refer to 
a survey article by Lee et al. for more details on this topic. For simplicity of calculations, we will use 
the rule that np and n(1 — p) are both greater than 5. 


Example 6.2.4 

An auto manufacturer gives a bumper-to-bumper warranty for 3 years or 36,000 miles for its new vehicles. 
In a random sample of 60 of its vehicles, 20 of them needed five or more major warranty repairs within the 
warranty period. Estimate the true proportion of vehicles from this manufacturer that need five or more 
major repairs during the warranty period, with confidence coefficient 0.95. Interpret. 


Solution 
Here we need to find a 95% confidence interval for the true proportion, p. Here, p=20/60=1/3. For 
a= 0.05, Za/2 = 20.025 = 1.96. Hence, a 95% confidence interval for p is 


1 2 
7 pa-p) 1 (5) (3) 
yg gg | 
pte]? n S 60 


which gives the confidence interval as (0.21405, 0.45262). That is, we are 95% confident that the true 
proportion of vehicles from this manufacturer that need five or more major repairs during the warranty 
period will lie in the interval (0.21405, 0.45262). 

= 


6.2.2 Margin of Error and Sample Size 

In real-world problems, the estimates of the proportion p are usually accompanied by a margin of 
error, rather than a confidence interval. For example, in the news media, especially leading up to 
election time, we hear statements such as “The CNN/USA Today/Gallup poll of 818 registered voters 
taken on June 27-30 showed that if the election were held now, the president would beat his challenger 
52% to 40%, with 8% undecided. The poll had a margin of error of plus or minus four percentage 
points.” What is this “margin of error”? According to the American Statistical Association, the margin 
of error is a common summary of sampling error that quantifies uncertainty about a survey result. 
Thus, the margin of error is nothing but a confidence interval. The number quoted in the foregoing 
statement is half the maximum width of a 95% confidence interval, expressed as a percentage. 


Let b be the width of a 95% confidence interval for the true proportion, p. Let p = x/n be an estimate 
for p where x is the number of successes in n trials. Then, 


ba 41.96, [A/G = Gi) (: fae fame en) 
n n n n 
= 3.92 /@/n)C. — (x/n)) < 3.92 = 
n 4n 


because (x/n)(1 — (x/n)) = p(1 — p) < . 
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Thus, the margin of error associated with p = (x/n) is 100d%, where 


1 
maxb 3-92y aq 1.96 
d = = = i 

2 2 2/n 
From the foregoing derivation, it is clear that we can compute the margin of error for other values of 
a by replacing 1.96 by the corresponding value of z,/2. 


A quick look at the formula for the confidence interval for proportions reveals that a larger sample 
would yield a shorter interval (assuming other things being equal) and hence a more precise estimate 
of p. The larger sample is more costly in terms of time, resources, and money, whereas samples that 
are too small may result in inaccurate inferences. Then, it becomes beneficial for finding out the 
minimum sample size required (thus less costly) to achieve a prescribed degree of precision (usually, 
the minimum degree of precision acceptable). We have seen that the large sample (1 — a)100% 


confidence interval for p is 
‘ Pp — P) “ p(l— p) 
P —2a/2¥ —T— < P< Pt 20/2 — 


s pd — p 7 7 
|? — pl < za/2y/ MES! 2 Pp 
n Jn 


which shows that, with probability (1 — a), the estimate p is within zg/2,/ p(1 — p)/n units of p. 
Because p(1 — p) < 1/4, for all values of p, we can write the foregoing inequality as 


Rewriting it, we have 


Za/2 fl Za/2 


|p iS —e rawr 


If we wish to estimate p at level (1 — a) to within d units of its true value, that is |p — p| < d, the 
sample size must satisfy the condition (Zg/2/(2./n)) < d, or 


Thus, to estimate p at level (1 — a) to within d units of its true value, take the minimal sample size as 
haz y2 /4d?, and if this is not an integer, round up to the next integer. 


Sometimes, we may have an initial estimate p of the parameter p from a similar process or from a 
pilot study or simulation. In this case, we can use the following formula to compute the minimum 
required size of the sample to estimate p, at level (1 — «), to within d units by using the formula 


24/2 PC — P) 
a ee 


and, if this is not an integer, rounding up to the next integer. 
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A similar derivation for calculation of sample size for estimation of the population mean , at level 
(1 — a) with margin of error E is given by 
2 29 
= 20/27 
= 


and, if this is not an integer, rounding up to the next integer. This formula can be used only if we 
know the population standard deviation, o. Although it is unlikely to know o when the population 
mean itself is not known, we may be able to determine o from an earlier similar study or from a pilot 
study/simulation. 


i. 2.0: S..=—&§$\.eece.42__* 
Example 6.2.5 
A dendritic tree is a branched formation that originates from a nerve cell. In order to study brain deve- 
lopment, researchers want to examine the brain tissues from adult guinea pigs. How many cells must the 
researchers select (randomly) so as to be 95% sure that the sample mean is within 3.4 cells of the population 
mean? Assume that a previous study has shown o = 10 cells. 


Solution 
A 95% confidence corresponds to #=0.05. Thus, from the normal table, zw/2 = 20.025 = 1.96. Given that 
E=3.4 and o=10, and using the sample size formula, the required sample size n is 


Ze2F _ (1.96)? (10)? 


m7 G.4)2 = 33.232. 


Thus, taken = 34. 


oun 


Example 6.2.6 
Suppose that a local TV station in a city wants to conduct a survey to estimate support for the president's 
policies on economy within 3% error with 95% confidence. 
(a) How many people should the station survey if they have no information on the support level? 
(b) Suppose they have an initial estimate that 70% of the people in the city support the economic 
policies of the president. How many people should the station survey? 


Solution 
Here a = 0.05, and thus zq/2 = 1.96. Also, d = 0.03. 
(a) With no information on p, we use the sample size formula: 


2 
z 1.96)? 

oS A ES  aart. 
4d2——-4(0.03)2 


Hence, the TV station must survey 1068 people. 
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(b) Because p = 0.7, the required sample size is calculated from 
2a/2PC — B) 
= eS 


_ (1.96)*(0.70)(0.30) 
7 (0.03)? 


= 896.37. 


Thus, the TV station must survey at least 897 people. 
[3] 


In practice, we should realize that one of the key factors of a good design is not sample size by 
itself; it is getting representative samples. Even if we have a very large sample size, if the sample is 
not representative of our target population, then sample size means nothing. Therefore, whenever 
possible, we should use random sampling procedures (or other appropriate sampling procedures) to 
ensure that our target population is properly represented. 


EXERCISES 6.2 


6.2.1. A survey indicates that it is important to pay attention to truth in political advertising. Based 
on a survey of 1200 people, 35% indicated that they found political advertisements to be 
untrue; 60% say that they will not vote for candidates whose advertisements are judged to be 
untrue; and of this latter group, only 15% ever complained to the media or to the candidate 
about their dissatisfaction. 

(a) Find a 95% confidence interval for the percentage of people who find political 
advertising to be untrue. 

(b) Find a 95% confidence interval for the percentage of voters who will not vote for 
candidates whose advertisements are considered to be untrue. 

(c) Find a 95% confidence interval for the percentage of those who avoid voting for can- 
didates whose advertisements are considered untrue and who have complained to the 
media or to the candidate about the falsehood in commercials. 

(d) For each case above, interpret the results and state any assumptions you have made. 


6.2.2. Many mutual funds use an investment approach involving owning stocks whose price/earn- 
ings multiples (P/Es) are less than the P/E of the S&P 500. The following data give P/Es of 
49 companies a randomly selected mutual fund owns in a particular year. 


68 56 85 85 84 75 93 94 7.8 7.1 
99 96 90 94 13.7 166 9.1 10.1 10.6 11.1 
8.9 11.7 128 115 12.0 106 11.1 64 12.3 12.3 
1.4 99 143 11.5 11.8 13.3 12.8 13.7 13.9 12.9 
14.2 14.0 15.5 16.9 18.0 17.9 21.8 18.4 34.3 


Find a 98% confidence interval for the mean P/E multiples. Interpret the result and state 
any assumptions you have made. 
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6.2.3. Let Xi,...,X» be arandom sample from N(, o7) distribution, o* known. 
(a) Show that @ = X is a maximum likelihood estimator of the population mean jz. 
(b) Show that 


Pixs ¥+ 22) 0.954 
= =< < — =U. 7 
we Va 
(c) Let 
Pe ges Fe 2 |= 600 
—-—=<yu< — } = 0.90. 
ja” Vn 
Find k. 


6.2.4. Let the observed mean of a sample of size 45 be X = 68.51 from a distribution having variance 
110. Find a 95% confidence interval for the true mean yw and interpret the result and state 
any assumptions you have made. 


6.2.5. In arandom sample of 50 college seniors, 18 indicated that they were planning to pursue a 
graduate degree. Find a 98% confidence interval for the true proportion of all college seniors 
planning to pursue a graduate degree, and interpret the result, and state any assumptions 
you have made. 


6.2.6. DVD players coming off an assembly line are automatically checked to make sure they are 
not defective. The manufacturer wants an interval estimate of the percentage of DVD players 
that fail the testing procedure. Compute a 90% confidence interval, based on a random 
sample of size 105 in which 17 DVD players failed the testing procedure. Also, interpret the 
result and state any assumptions you have made. 


6.2.7. Studies have shown that the risk of developing coronary disease increases with the level of 
obesity, or accumulation of body fat. A study was conducted on the effect of exercise on 
losing weight. Fifty men who exercised lost an average of 11.4 lb, with a standard deviation 
of 4.5 Ib. Construct a 95% confidence interval for the mean weight loss through exercise. 
Interpret the result and state any assumptions you have made. 


6.2.8. Basing findings on 60 successful pregnancies involving natural birth, an experimenter found 
that the mean pregnancy term was 274 days, with a standard deviation of 14 days. Construct 
a 99% confidence interval for the true mean pregnancy term ju. 


6.2.9. Let Y be the binomial random variable with parameter p and n = 400. If the observed value 
of Y is y= 120, find a 95% confidence interval for p. 


6.2.10. For a health screening in a large company, the diastolic and systolic blood pressures of all 
the employees were recorded. In a random sample of 150 employees, 12 were found to 
suffer from hypertension. Find 95% and 98% confidence intervals for the proportion of the 
employees of this company with hypertension. 


6.2.11. In a random sample of 500 items from a large lot of manufactured items, there were 40 
defectives. 
(a) Find a 90% confidence interval for the true proportion of defectives in the lot. 
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(b) Is the assumption of normal approximation valid? 

(c) Suppose we suspect that another lot has the same proportion of defectives as in the first 
lot. What should be the sample size if we want to estimate the true proportion within 
0.01 with 90% confidence? 


6.2.12. Pesticide concentrations in sediment from irrigation areas can provide information required 
to assess exposure and fate of these chemicals in freshwater ecosystems and their likely 
impacts to the marine environment. In a study (Jochen F. Muller et al., “Pesticides in sedi- 
ments from Queensland Irrigation channels and drains,” Marine Pollution Bulletin 41(7-12), 
294-301, 2000), 103 sediment samples were collected from irrigation channels and drains 
in 11 agricultural areas of Queensland. In 74 of these samples, they detected DDTs with con- 
centration levels up to 840 ngg~! dw. Obtain a 95% confidence interval for the proportion 
of total number of sediments with detectable DDTs. 


6.2.13. Let X be the mean of arandom sample of size n from an N(, 16) distribution. Find n such 
that p(X —2 <u <X+4+2)=0.95. 


6.2.14. Let X be a Poisson random variable with parameter 4. A sample of 150 observations from 
this population has a mean equal to 2.5. Construct a 98% confidence interval for i. 


6.2.15. Anopinion poll conducted in March of 1996 by a newspaper (Tampa Tribune) among eligible 
voters with a sample size 425 showed that the president, who was seeking reelection, had 
45% support. Give a 95% and a 98% confidence interval for the proportion of support for 
the president. 


6.2.16. Arandom sample of 100 households located in a large city recorded the number of people 
living in the household, Y, and the monthly expenditure for food, X. The following summary 
Statistics are given. 


100 


+ = 340 
i=1 


100 
>" ¥? = 1650 
i=1 


100 
> X; = 40,000 
i=1 
100 
) = X? = 44,000,000 
i=1 
(a) Form a 95% confidence interval for the mean number of people living in a household 
in this city. 
(b) Form a 95% confidence interval for the mean monthly food expenses. 
(c) For each case just given, interpret the results and state any assumptions you have made. 
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6.2.17. Let X1,..., X, be a random sample from an exponential distribution with parameter 0. 
A sample of 350 observations from this population has a mean equal to 3.75. Construct a 
90% confidence interval for 0. 


6.2.18. Suppose a coin is tossed 100 times in order to estimate p = p (Head). It is observed that head 
appeared 60 times. Find a 95% confidence interval for p. 


6.2.19. Suppose the population is women at least 35 years of age who are pregnant with a 
fetus affected by Down syndrome. We are interested in testing positive on a noninvasive 
screening test for fetuses affected by Down syndrome in women at least 35 years of age. 
In an experiment, suppose 52 of 60 women tested positive. Obtain a 95% confidence 
interval for the true proportion of women at least 35 years of age who are pregnant 
with a fetus affected by Down syndrome who will receive positive test results from this 
procedure. 


6.2.20. (a) Let X1,...,X, be a random sample from a Poisson distribution with parameter 4. 
Derive a (1 — a) 100% large sample confidence interval for A. 

(b) To date nodes in a phylogenetic tree, the mean path length (MPL) is used in estimating 

the relative age of a node. The following data represent the MPL for 39 nodes (source: 

Tom Britton, Bengt Oxelman, Annika Vinnersten, and Kare Bremer, “Phylogenetic dating 

with confidence intervals using mean path-lengths”). Assume that the data (given in 
centimeters) follow a Poisson distribution with parameter 1. 


65.2 47.0 38.2 13.5 18.0 25.6 16.3 14.0 23.2 18.8 
7.55 13.3 11.00 54.9 22.0 50.1 32.6 26.0 13.0 9.0 
7.2 4.7 45 41.1 45.8 37.0 85 305 29.3 13.8 
7.7 5.5 24.1 12.5 22.33 19.0 95 4.7 3.0 


Obtain a 95% confidence interval for 4 and interpret. 


6.2.21. A person plans to start an Internet service provider in a large city. The plan requires an 
estimate of the average number of minutes of Internet use of a household in a week. How 
many households must be (randomly) sampled to be 95% sure that the sample mean is 
within 15 minutes of the population mean? Assume that a pilot study estimated the value 
of o = 35 minutes. 


6.2.22. The fruit fly Drosophila melanogaster normally has a gray color. However, because of mutation 
a good portion of them are black. A biologist eager to learn about the effect of mutation 
wants to collect a random sample to estimate the proportion of black fruit flies of this type 
within 1% error with 95% confidence. 


(a) How many individual flies should the researcher capture if there is no information on 
the population proportion of black flies? 

(b) Suppose the researcher has the initial estimate that 25% of the fruit fly Drosophila 
melanogaster have been affected by this mutation. What is the sample size? 
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6.2.23. In a pharmacological experiment, 35 lab rats were not given water for 11 hours and were 
then permitted access to water for 1 hour. The amounts of water consumed (mL/hour) are 
given in the following table. 


10.6 13.3 155 10.7 96 12.1 11.8 10.9 9.9 13.2 

9.3 11.7 9.9 13.0 12.3 11.0 13.1 11.0 12.5 13.9 
14.1 14.8 15.1 12.8 140 7.1 14.1 12.7) 9.6 12.5 
9.0 12.7 13.6 12.5 12.6 


Obtain a 98% confidence interval for the mean amount of water consumed. 


6.3 SMALL SAMPLE CONFIDENCE INTERVALS FOR jz 


Now we will consider the problem of finding a confidence interval for the true mean y of a normal 
population when the variance o? is unknown and obtaining a large sample is either impossible or 
impractical. Let X,,..., X, be arandom sample from a normal population. We know that 


_ vn _X=n 
~ J/—S2/[o2(n— 1] S/n 


has a f-distribution with (n — 1) degrees of freedom, irrespective of the value of o?. Thus, (X — 
y)/(S/./n) can be used as a pivot. Hence, for n small (n < 30) and o? unknown, we have the 
following result. 


Theorem 6.3.1 If X and S are the sample mean and the sample standard deviation of a random sample of 
size n from a normal population, then 


= S = S 
a la /2.n—1 Vn <m<X+ la /2.n—1 Vn 
is a (1 — a)100% confidence interval for the population mean |. 


Note that if the confidence coefficient, 1 — a, and X and S remain the same, the confidence range 
CR = 6y — 6, decreases as the sample size n increases, which means that we are closing in on the 
true parameter value of 0. 


One can use the following procedure to find the confidence interval for the mean when a small 
sample is from an approximately normal distribution. 


PROCEDURE TO FIND SMALL SAMPLE CONFIDENCE INTERVAL FOR iz 
1. Calculate the values of X and S, from the sample Xj, ..., Xn. 
2. Using the t-table, select two tail values —ty,2 and ty2. 
3. The (1 — a)100% confidence interval for ju is 


= Soo S 
(x — tey/2.n—1 Tae + ty/2.n—1 =) 


that is, P (x — te/2.n-1 a <wsXt+tyyrn-1 +) = i|=c 


6.3 Small Sample Confidence Intervals for z 311 


4. Conclusion: We are (1 — «)100% confident that the true parameter yu lies in the interval 


= tw/2.n—1 (S/,/n) ia ta/2.n—1 (S/,/n)). 
5. Assumption: The population is normal. 


In practice, the first step in the previous procedure should include a test of normality (see Project 4C). 
A built-in test of normality is available in most of the statistical softwares packages. In Example 6.3.3, 
we show how this test is utilized. Even when the data fail the normality test, most statistical software 
will produce a confidence interval based on normality or give an error report. We should understand 
that generally such answers are meaningless. In those cases, nonparametric methods (Chapter 12) 
such as the Wilcoxon rank sum method or bootstrap methods (Chapter 13) will be more appropriate. 
For more discussion, refer to Section 14.4.1. 


——— ee 


Example 6.3.1 
The following is a random data from a normal population. 


72 5.7 49 62 85 28 


Construct a 95% confidence interval for the population mean jz. Interpret. 


Solution 

The first step is to calculate mean and standard deviation of the sample. We compute as the mean xX = 
5.883 and standard deviation, s = 1.959. For 5 degrees of freedom, and for a = 0.05, from the t-table, 
to.025 = 2.571. Hence, a 95% confidence interval for px is 


(= — ty/2.n-1 Ga x+ la/2.n—1 3.) 


= (5.883 — 2.571 (2282), 5.5883 + 2.571 (4982)) 


= (3.827, 7.939). 


This can be interpreted as that we are 95% confident that the true mean p will be between 3.827 and 7.939. 
| 


———_——_-297. OO m@&PY€<V&€V—PR——aV—V—eee—————— ene eR Re ee 
Example 6.3.2 
The scores of a random sample of 16 people who took the TOEFL (Test of English as a Foreign Language) 
had a mean of 540 and a standard deviation of 50. Construct a 95% confidence interval for the population 
mean 2 of the TOEFL score, assuming that the scores are normally distributed. 


Solution 
Because n = 16 is small, using Theorem 6.3.1 with degrees of freedom 15, a 95% confidence interval for ju is 


50 
X + te/2.n-1 ii = 540+2.131 (=). 


So the 95% confidence interval for the population mean yw of the TOEFL scores is (513.36, 566.64). 
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A Dobson unit is the most basic measure used in ozone research. The unit is named after G. M. B. Dobson, 
one of the first scientists to investigate atmospheric ozone (between 1920 and 1960). He designed the 
Dobson spectrometer—the standard instrument used to measure ozone from the ground. The data in 
Example 6.3.3 represent the total ozone levels at randomly selected points on the earth (represented 
by the pair (Latitude, Longitude)) on a particular day from the NASA site http://jwocky.gsfc.nasa.gov/ 
teacher/ozone_overhead.html?228,110. You could use this site to find the amount of the total column ozone 
over where you are now with a two-day delay. 


En 
Example 6.3.3 
The following data represent the total ozone levels measured in Dobson units at randomly selected locations 
of earth on a particular day. 


269 246 388 354 266 303 
295 259 274 249 271 254 
Can we say that the data are approximately normally distributed? Construct a 95% confidence interval for 


the population mean jz of ozone levels on this day. 


Solution 
The following is the probability plot of these data created using Minitab. 


Normal probability plot for ozone data 


ML Estimates 
Mean : 285.667 
Std Dev: 42.0086 


Percent 


Because all the data values lie within the bounds on the normal probability plot (see the discussion in 
Section 3.2.4), we can assume that the data have approximate normality. We have X = 285.7 and s = 43.9. 
Also n = 12. For a = 0.05, to.025,11 = 2.201. A 95% confidence interval for yu is 


- s 43.9 


Hence, a 95% confidence interval for jz, the average ozone level over the earth, lies in (257.81, 313.59). 
|__| 
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EXERCISES 6.3 


6.3.1. (a) How does the ¢-distribution compare with the normal distribution? 
(b) How does the difference affect the size of confidence intervals constructed using z 
(normal approximation) relative to those constructed using the t-distribution? 
(c) Does sample size make a difference? 
(d) What assumptions do we need to make in using the t-distribution for the construction 
of a confidence interval? 


6.3.2. Use the t-table to determine the values of ta/2 that would be used in the construction of a 
confidence interval for a population mean in each of the following cases: 
(a) a= 0.99,n = 20 
(b) a=0.95,n = 18 
(c) a=0.90,n = 25 


6.3.3. Let Xi,..., X, be a random sample from a normal population. A particular realization 
resulted in a sample mean of 20 with the sample standard deviation 4. Construct a 95% 
confidence interval for 2 when: 

(a) n = 5, (b) n = 10, and (c) n = 25. What happens to the length of the confidence interval 
as n changes? 


6.3.4. Ina large university, the following are the ages of 20 randomly chosen employees: 


24 31 28 43 28 56 48 39 52 32 
38 49 51 49 62 33 41 58 63 56 


Assuming that the data come from a normal population, construct a 95% confidence interval 
for the population mean yu of the ages of the employees of this university. Interpret your 
answer. 


6.3.5. Arandom sample of size 26 is drawn from a population having a normal distribution. The 
sample mean and the sample standard deviation from the data are given, respectively, as 
X = —2.22 and s = 1.67. Construct a 98% confidence interval for the population mean pu 
and interpret. 


6.3.6. A drug is suspected of causing an elevated heart rate in a certain group of high-risk patients. 
Twenty patients from the group were given the drug. The changes in heart rates were found 
to be as follows. 


-1 8 5 10 2 12 7 9 1 3 
4 6 4 12 ll 2 -1 10 2 8 


Construct a 98% confidence interval for the mean change in heart rate. Assume that the 
population has a normal distribution. Interpret your answer. 


6.3.7. Ten bearings made by a certain process have a mean diameter of 0.905 cm with a standard 
deviation of 0.0050 cm. Assuming that the data may be viewed as a random sample from a 
normal population, construct a 95% confidence interval for the actual average diameter of 
bearings made by this process and interpret. 
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6.3.8. Air pollution in large U.S. cities is monitored to see whether it conforms to requirements set 
by the Environmental Protection Agency. The following data, expressed as an air pollution 
index, give the air quality of a city for 10 randomly selected days. 


57.3 58.1 58.7 66.7 58.6 61.9 59.0 64.4 62.6 64.9 


Assuming that the data may be looked upon as a random sample from a normal population, 
construct a 95% confidence interval for the actual average air pollution index for this city 
and interpret. 


6.3.9. In order to find out the average hemoglobin (Hb) level in children with chronic diarrhea, 
a random sample of 10 children with chronic diarrhea is selected from a city and their Hb 
levels (g/dL) are obtained as follows: 


12.3 11.4 14.2 15.3 14.8 13.8 11.1 15.1 15.8 13.2 


Assuming that the data may be looked upon as a random sample from a normal population, 
construct a 99% confidence interval for the actual average Hb level in children with chronic 
diarrhea for this city and interpret. Draw a box plot and normal plot for this data, and 
comment. 


6.3.10. Suppose that you need to estimate the mean number of typographical errors per page in 
the rough draft of a 400-page book. A careful examination of 10 pages gives an average of 6 
errors per page with a standard deviation of 2 errors. Assuming that the data may be looked 
upon as a random sample from a normal population, construct a 99% confidence interval 
for the actual average number of errors per page in this book and interpret. In this problem, 
is the normal model appropriate? 


6.3.11. Creatine kinase (CK) is found predominantly in muscle and is released into the cir- 
culation during muscular lesions. Therefore, serum CK activity has been theoretically 
expected to be useful as a marker in exercise physiology and sports medicine for 
the detection of muscle injury and overwork. The following data represent the peak 
CK activity (measured in IU/L) after 90 minutes of exercise in 15 healthy young 
men. (Source: Manabu Totsuka, Shigeyuki Nakaji, Katsuhiko Suzuki, Kazuo Sugawara, 
and Koki Sato, Break point of serum creatine kinase release after endurance exercise, 
http://jap.physiology.org/cgi/content/full/93/4/1280.) 


1112 722 689 251 196 185 128 102 166 178 
775 694 514 244 208 


Construct a 95% confidence interval for the mean peak CK activity. 


6.3.12. A random sample of 20 observations gave the following summary statistics: )> xj = 234 
and >> x?=3048. Assuming that the data may be looked upon as a random sam- 
ple from a normal population, construct a 95% confidence interval for the actual 
average, LU. 
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6.3.13. Let a random sample of size 17 from a normal population for which both mean pw and 
variance o* are unknown yield x = 3.12 and s* = 1.04. Determine a 99% confidence 
interval for p. 


6.3.14. Arandom sample from a normal population yields the following 25 values: 


90 87 121 96 106 107 89 107 83 92 
117 93 98 120 97 109 78 87 99 79 
104 85 91 107 89 


(a) Calculate an unbiased estimate 6 of the population mean. 
(b) Give approximate 99% confidence interval for the population mean. 


6.3.15. The following are random data from a normal population. 


3.3 33 4.7 26 64 4.7 1.7 45 5.0 3.0 


Construct a 98% confidence interval for the population mean j. 


6.3.16. The following data represent the rates (micrometers per hour) at which a razor cut made in 
the skin of anesthetized newts is closed by new cells. 


28 20 21 39 32 23 18 31 14 23 
18 22 28 24 33 12 23 21 25 25 


(a) Can we say that the data are approximately normally distributed? 

(b) Find a 95% confidence interval for population mean rate jz for the new cells to close a 
razor cut made in the skin of anesthetized newts. 

(c) Find a 99% confidence interval for jw. 

(d) Is the 95% CI wider or narrower than the 99% CI? Briefly explain why. 


6.3.17. For a particular car, when the brake is applied at 62mph, the following data give 
stopping distance (in feet) for 10 random trials on a dry surface. (Source: http:// 
www.nhtsa.dot.gov/cars/testing/brakes/b.pdf.) 


146.9 148.4 149.4 148.6 150.3 
147.5 147.5 149.3 148.4 145.5 


(a) Can we say that the data are approximately normally distributed? 
(b) Find a 95% confidence interval for population mean stopping distance ju. 


6.4 A CONFIDENCE INTERVAL FOR THE POPULATION VARIANCE 


In this section we derive a confidence interval for the population variance o* based on the chi- 
square distribution (x?-distribution). Recall that the x?-distribution, like the Student t-distribution, 
is indexed by a parameter called the degrees of freedom. However, the x?-distribution is not symmetric 
and covers positive values only, and hence it cannot be used to describe a random variable that assumes 
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negative values. Let X1,..., X, be normally distributed with mean yw and variance o*, with both pu 
and o unknown. We know that 
n 
(X; — X)? 
= _ (n—1)8? 
(on 2 7 oO 2 


has a x?-distribution with (n — 1) degrees of freedom irrespective of o*. Hence it can be used as a 
pivot. We now find two numbers x7 and yj, such that 


The foregoing inequality can be rewritten as 


2 2 
p( SaPF a? CPS) aa 


CO 


Hence, a (1 — a) 100% confidence interval for o* is given by ((n — 1)S?/xj,, (n — 1)S?/x7). For 
convenience, we take the areas to the right of xz, = x{,7 and to the left of xz = x7{_,/7 to be both 
equal to a/2; see Figure 6.3. Using the chi-square table we can find the values of x7). and x7_9/9- 
Then, we have the following result. 


Theorem 6.4.1 If X and S are the mean and standard deviation of a random sample of size n from a normal 
population, then 


= 2 = 2 

o( 2 oe <o< es pe Sto 
Xu/2 X1-a/2 

where the x?-distribution has (n — 1) degrees of freedom. 


That is, we are (1 — «)100% confident that the population variance o? falls in the interval ((n — 1) S?/ x2 Jar 
(n — 1)S?/X4_g 2): 


W@ FIGURE 6.3 Chi-square density with equal area on both sides of the Cl. 
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e——— 
Example 6.4.1 
A random sample of size 21 from a normal population gave a standard deviation of 9. Determine a 90% 
confidence interval for o?. 


Solution 
Here n=21 and s*=81. From the x2-table with 20 degrees of freedom, ee = 31.4104 and 
Xawe = 10.8508. Therefore, a 90% confidence interval for 02 is obtained from 


(% 182 (n— v=) 
Xa/2 Xt-0/2 


Thus, we get 


(20)(81) 2 . (20)(81) 
——___ <o* < —___ 
31.4104 10.8508 


or, we are 90% confident that 51.575 < 02 < 149.298. 
= 


We can summarize the steps for obtaining the confidence interval for the true variance as follows. 


PROCEDURE TO FIND CONFIDENCE INTERVAL FOR o2 
1. Calculate x and s? from the sample x1, ...,Xn. 
2. Find xj, = x2/5,and x7 = x7_,/2 using the x?-square table with (n — 1) degrees of freedom. 
3. Compute the (1 — a)100% confidence interval for the population variance o? as 
((n — 1)s7/x2 5, (n — Ws ap) _ where x2-values are with (n — 1) degrees of freedom. 


Assumption: The population is normal. 


III ooo 
Example 6.4.2 
The following data represent cholesterol levels (in mg/dL) of 10 randomly selected patients from a large 
hospital on a particular day. 


360 352 294 160 146 142 318 200 142 116 


Determine a 95% confidence interval for o7. 


Solution 
From the data, we can get ¥ = 223 and standard deviation s = 96.9. The following probability graph is 
obtained by Minitab. 
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Normal probability plot for cholesterol levels 


Percent 


T T T T 
100 200 300 400 500 
Data 


Even though the scattergram does not appear to follow a straight line, the data are still within the band, 
50 we can assume approximate normality for the data. (In situations like this, we could also use nonpara- 
metric tests explained in Chapter 12.) A box plot of the data shows that there are no outliers. From the 
x2-table, gon) = 19.023 and Kove (9) = 2.70. Therefore a 90% confidence interval for o2 is obtained 


from 
(n — 1)S2 (n — 1)S?2 
Xej22 — VD Gay" — D) 


(9)(96.9)? ees (9)(96.9)? 
19.023 2.70 


Thus, we get 


or, we are 95% confident that 4442.3 <0* < 31,299. Note that the numbers look very large, but it is the 
value of variance. By taking the square root of the numbers on the both sides, we can also get a confidence 
interval for the standard deviation o. 


As remarked in the previous exercise, in general to find a (1 — a)100% confidence interval for the true 
population standard deviation, a, take the square roots of the end points of the confidence interval of the 
variance. 


EXERCISES 6.4 


6.4.1. Arandom sample of size 20 is drawn from a population having a normal distribution. The 
sample mean and the sample standard deviation from the data are given, respectively, as 
X¥ = —2.2 ands = 1.42. Construct a 90% confidence interval for the population variance o* 
and interpret. 
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6.4.2. A drug is suspected of causing an elevated heart rate in a certain group of high-risk patients. 
Twenty patients from the group were given the drug. The changes in heart rates were found 
to be as follows. 


-1 8 5 10 2 12 7 9 1 3 
4 6 4 12 11 2 -1 10 2 8 


Construct a 95% confidence interval for the variance of change in heart rate. Assume that 
the population has a normal distribution and interpret. 


6.4.3. Air pollution in large U.S. cities is monitored to see whether it conforms to requirements set 
by the Environmental Protection Agency. The following data, expressed as an air pollution 
index, give the air quality of a city for 10 randomly selected days. 


56.23 57.12 57.7 65.80 59.40 
62.90 58.00 64.56 63.92 63.45 


Assuming that the data may be viewed as a random sample from a normal population, 
construct a 99% confidence interval for the actual variance of the air pollution index for this 
city and interpret. 


6.4.4. A random sample of 25 observations gave the following summary statistics: }° x; = 234 
and )> x? = 3048. Assuming that the data can be looked upon as a random sample from a 
normal population, construct a 95% confidence interval for the actual variance, 07. 


6.4.5. Let a random sample of size 18 from a normal population with both mean wp and variance 
o* unknown yield ¥ = 2.27 and s* = 1.02. Determine a 99% confidence interval for o7. 


6.4.6. Suppose we want to study contaminated fish in a river. It is important for the study to know 
the size of the variance o? in the fish weights. The 25 samples of fish in the study produced 
the following summary statistics: ¥ = 1030.5 g, and the standard deviation s = 200.6g. 
Construct a 95% confidence interval for the true variation in weights of contaminated fish 
in this river. 


6.4.7. Arandom sample from a normal population yields the following 25 values: 


90 87 121 96 106 107 89 107 83 92 
117 93 98 120 97 109 78 87 99 79 
104 85 91 107 89 


(a) Calculate an unbiased estimate 6* of the population variance. 
(b) Give approximate 99% confidence interval for the population variance. 
(c) Interpret your results and state any assumptions you made in order to solve the problem. 


6.4.8. It is known that some brands of peanut butter contain impurities within an acceptable level. 
A test conducted on randomly selected 12 jars of a certain brand of peanut butter resulted 
in the following percentages of impurities: 


19 2.7 2.1 2.8 23 36 1.4 1.8 2.1 3.2 2.0 


320 CHAPTER6 Interval Estimation 


(a) Construct a 95% confidence interval for the average percentage of impurities in this 
brand of peanut butter. 

(b) Give an approximate 95% confidence interval for the population variance. 

(c) Interpret your results and test for normality. 


6.4.9. The following data represent the maximal head measurements (across the top of the skull) 
in millimeters of 15 Etruscans (inhabitants of ancient Etruria). 


152 147 126 140 135 139 149 140 
142 147 132 148 146 143 137 


(a) Calculate an unbiased estimate 6* of the population variance. 
(b) Give approximate 95% confidence interval for the population variance. 
(c) Interpret your results and test for normality. 


6.4.10. A pharmaceutical company tested a new drug to be marketed for the treatment of a particular 
type of virus. In order to obtain an estimate on the mean recovery time, this drug was tested 
on 15 volunteer patients, and the recovery time (in days) was recorded. The following data 
were obtained. 


8 17 10 6 34 11 13 6 9 8 
19 4 12 17 7 


(a) Obtain a 95% confidence interval estimate of the mean recovery. 
(b) What assumptions do we need to make? Test for these assumptions. 


6.4.11. The rates of return (rounded to the nearest percentage) for 25 clients of a financial firm are 
given in the following table. 


13 11 28 6 -4 15 13 6 ll I 
3 12 20 3 16 16 15 8 20 15 
4 1 12 2 -9 


Find a 98% confidence interval for the variance o* of rates of return. Use this to find the 
confidence interval for the population standard deviation, o. 


6.4.12. In order to test the precision of a new type of blood sugar monitor for diabetic patients, 20 
randomly selected monitors of this type were used. A blood sample with 120 mg/dL was 
tested in each of these monitors, and the resulting readings are given in the following table. 


17 6 121 120 122 117 #120 120 118 119 
18 123 119 123 119 122 118 122 121 120 


(a) Obtain a 99% confidence interval for the variance o7. 
(b) Is it reasonable to assume that the data follow a normal distribution? 
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6.5 CONFIDENCE INTERVAL CONCERNING TWO POPULATION PARAMETERS 


In the earlier sections we studied the confidence limits of true parameters from samples from single 
populations. Now, we consider the interval estimation based on samples from two populations. Our 
interest is to obtain a confidence interval for the parameters of interest based on two independent 
samples taken from these two populations. 


Let Xu1,..., X1n, bea random sample from a normal distribution with mean jy; and variance or, and 
let X21, ..., X2n, be arandom sample from a normal distribution with mean jz2 and variance O35. Let 
X,= (1/ny) Do} X1; and X2 = (1/n2) 2, X2;. We will assume that the two samples are indepen- 
dent. Then X; and X>2 are independent. The distribution of X; — Xz is N(u1 — M2, (1/ny)o? + 
qd /n2)o%). Now as in the one-sample case, the confidence interval for w; — f2 is obtained as 
follows. 


LARGE SAMPLE CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO MEANS 


(i) 01, 02 are known. The (1 — @)100% large sample confidence interval for 41 — j12 is 
given by 


(ii) If 07 and o2 are not known, oj and o2 can be replaced by the respective sample standard 
deviations S; and Sz when n; > 30, i = 1,2. Thus, we can write 


Assumptions: The population is normal, and the samples are independent. 


eIIVQVQQVu ooo 
Example 6.5.1 
A study of two kinds of machine failures shows that 58 failures of the first kind took on the average 
79.7 minutes to repair with a standard deviation of 18.4 minutes, whereas 71 failures of the second kind 
took on average 87.3 minutes to repair with a standard deviation of 19.5 minutes. Find a 99% confidence 
interval for the difference between the true average amounts of time it takes to repair failures of the two 
kinds of machines. 
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Solution 
Here, ny =58, no =71, X1 =79.7, 51 = 18.4, X¥2 = 87.3, and sy =19.5. Then the 99% confidence interval 
for [41 — [42 Is given by 


(18.4)2 (19.5)? 
79.7 — 87.3) £2.575 . 
( ) / 58 a 71 


That is, we are 99% certain that 41 — [42 is located in the interval (—16.215, 1.0149). Note that —16.215 < 
[41-2 < 1.0149 means that more than 90% of the length of this interval is negative. Thus, we can conclude 


that 42 dominates 11, that is, 42 > 44 More than 90% of the time. 
[=a 


In the small sample case, the problem of constructing confidence intervals for the difference of the 


means from the two normal populations with unknown variances can be a difficult one. However, if 


we assume that the two populations have a common but unknown variance, say of = 05 = 07, we 


can obtain an estimate of the variance by pooling the two sample data sets. Define the pooled sample 
variance S%, as 


a n2 


: (xu- hi) +d (Xo; —X1) 
g2 _ i=1 i=1 
Pp 


nytn2—2 


_ (1 — IST + G2 — 1)83 
~ nytn2—2 , 


Now, when the two samples are independent, 


has a t-distribution with n; + nz — 2 degrees of freedom. We summarize the CI for 1 — 2 below. 


SMALL SAMPLE CONFIDENCE INTERVAL FOR THE DIFFERENCE OF TWO MEANS (03 = o3) 
The small sample (1 — @)100% confidence interval for 41 — j22 is 


—_ = 1 1 
a) 32 tox/2, (m-+n2—2) Sp] ae 


Assumptions: The samples are independent from two normal populations with equal variances. 
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OOOO: nnn a EeESceeee 
Example 6.5.2 
Independent random samples from two normal populations with equal variances produced the following 
data. 


Sample1: 1.2 3.1 1.7 28 3 
Sample2: 4.2 2.7 36 3.9 


(a) Calculate the pooled estimate of 0. 
(b) Obtain a 90% confidence interval for w1 — 2. 


Solution 
(a) We have ny =5 and nz = 4. Also, 


a( = 236,.. s S0.733 


Hence, X72 = 3.6, a5 = 0.42. 
2 — Ms Dstt 2 = D5 


= 0.5989. 
P nytng—2 


(b) For the confidence coefficient 0.90, a = 0.10 and from the t-table, to.95,7 = 1.895. Thus, a 90% 
confidence interval for 41 — 42 is 


a rs 1 1 
(X1 — X2) £ te/2,(ny+n2-2)5p a + = 
1 1 
= (2.36 — 3.6) + 1.895, /0.5989 (; + i) 


= —1.24 £0.98 = (—2.22, —0.26). 


Here, 42 dominates «41 uniformly. Note that we can decrease the confidence range —2.22 to 0.26, by 
increasing ny and n2, with 1 — a = 0.90 to remain the same. This means that we are closing on the 


unknown true value of 44 — (12. 
fea 


In the small sample case, if the equality of the variances cannot be reasonably assumed, that is 
o? #03, we can still use the previous procedure, except that we use the following degrees of freedom 
in obtaining the ¢- value from the table. Let 
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The number given in this formula is always rounded down for the degrees of freedom. Hence, in this 
case, a small sample (1 — a) 100% confidence interval for w — 42 is given by 


a Ss? g? 
(X1 — X2) + te/2,4) ++ >, 
ny nz 


where the f-distribution has v degrees of freedom as given previously. 


Example 6.5.3 


Assuming that two populations are normally distributed with unknown and unequal variances. Two 
independent samples are taken with the following summary statistics: 

ny =16 X,=20.17 5; =43 

nmg=11 X2=19.23 s9=3.8 
Construct a 95% confidence interval for 41 — j2. 


Solution 
First let us compute the degrees of freedom, 


(4.3)? (3.8)? 
( i Ol 


ee te : (4.3)2\7— /3.8)2\* 
(::) Pn 16 ii 
a 


m—-1. nj—1 15 110 


Hence, v = 23, and t9,925,23 = 2.069. 
Now a 95% confidence interval for 41 — {42 is 


2 2 

a Sy. 89 

(¥1 —X2) + tay2,f — + 2 = (20.17 - 19.23) 
ny ng 


(4.3)2 (3.8)2 
16 ‘i ll 


+ (2.069) 
which gives the 95% confidence interval as 


—2.3106 < uw, — M2 < 4.1906. 
[2 


In a real-world problem, how do we determine if o7 = 04, or 07 #4 0% so that we can select one of 
the two methods just given? In Chapter 14, we discuss a procedure that determines the homogeneity 
of the variances (i.e., whether o7 = 03). For the time being a good indication is to look at the point 
estimators of of and 0, namely, S? and S3. If the point estimators are fairly close to each other, then 
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we can select of = 03. Otherwise, of # 0%. For a more general method of testing for equality of 
variances, we refer to Section 14.4.3. 


We now give a procedure for a large sample confidence interval for the difference of the true 
proportions, p; — pz, in two binomial distributed populations. 


LARGE SAMPLE CONFIDENCE INTERVAL FOR p,— p2 
The (1 — a)100% large sample confidence interval for p; — p2 is given by 


Se ar 
pr ( Pr) | Pa Pe) 


n n2 


(61 — 2) + Zey2 ( 


where pj and p> are the point estimators of p; and pz. This approximation is applicable if 6;n; > 5, 
i = 1,2 and (1 — p;)n; = 5, i = 1,2. The two samples are independent. 


$$ 


Example 6.5.4 

Iron deficiency, the most common nutritional deficiency worldwide, has negative effects on work capac- 
ity and on motor and mental development. In a 1999-2000 survey by the National Health and Nutrition 
Examination Survey (NHANES), iron deficiency was detected in 58 of 573 white, non-Hispanic females 
(10% rounded to whole number) and 95 of 498 (19% rounded to whole number) black, non-Hispanic 
females (source: http://www.cdc.gov/mmwr/preview/mmwrhtml/mm5140a1.htm). Let p; be the propor- 
tion of black, non-Hispanic females with iron deficiency and let p be the proportion of black, non-Hispanic 
females with iron deficiency. Obtain a 95% confidence interval for p; — po. 


Solution 
Here, ny =573 and n2=498. Also, py = 25 =0.10122 & 0.1, and p2= = 0.1907 * 0.19. For 
a=0.05, Z9,025 = 1.96. Hence, a 95% confidence interval for py — p2 is 


& Rn Di(1— Dp ee 
(P1 — P2) aja |( 2 am int | 


ny 
(0. {8 4 (0.19)(0.81) 
498 


= (0.1— 0.19) + (1. 96), 0-0 
= (—0.13232, —0.047685). 


Here, the true difference of py — p2 is located in the negative portion of the real line, which tells us that the 
true proportion of black, non-Hispanic females with iron deficiency is larger than the proportion of white, 


non-Hispanic females with iron deficiency . 
| 


There are situations in applied problems that make it necessary to study and compare the true variances 
of two idependent normal distributions. For this purpose, we will find a confidence interval for the 
ratio 07/03 using the F-distribution. Let X;,..., Xn, and Y1,..., Yn, be independent samples of size 
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ny and n from two normal distributions N(j11, 07) and N(j12, 05), respectively. Let S; and S3 be the 
variances of the two random samples. The confidence interval for the ratio 07 /o% is given as follows. 


2 
A (1 — «)100% CONFIDENCE INTERVAL FOR = 
2 


A (1 — «)100% confidence interval for 0?/a% is given by 


Sle 
BP Niet ttege 7) N52) \im inet @/2)/ 
Se ( 1 ) CE a ( 1 ) 
a SSS 
Se Fny—1,n2—1, 1—a/2 05 $5} \Finy—1,n2-1, 1(a/2) 


=1-a. 


That is, 


Assumptions: The two populations are normal, and the samples are independent. 


Note that we can also write a (1 — «) 100% confidence interval for 07 /o% in the form 


St ( 1 ) St)» 
> —1,ni—1,1-a/2 ]- 
Ss Fy —1,n2—-1,1-a/2 Se = ” a 


The following example illustrates how to find the confidence interval for 07/03. 


wea SSS 
Example 6.5.5 
Assuming that two populations are normally distributed, two independent random samples are taken with 
the following summary statistics: 


ny =21 X,=20.17 5, =4.3 
n2=16 xX2=19.23 s2=3.8 


Construct a 95% confidence interval for o2 /o?. 
1/% 


Solution 
Here, ny = 21, nz = 16, anda = 0.05. Using the F-table, we have 


Fny—1,n2-1,1-a/2 = F'(20, 15, 0.975) = 2.76 
and 


Fny—1,m-1,1-a/2 = F(15, 20, 0.975) = 2.57. 
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A 95% confidence interval for ot /05 is 


s? 1 5? 
; Fry y—1,n1-1,1-a/2 
Se fF ri=tnestasa SS 


2 2 
= (( <3) (5 ae ($25) an) = (0.46394, 3.2908). 


That is, we are 95% confident that the ratio of true variance, ot /o5, is located in the interval that implies a 
95% confidence interval (0.46394, 3.2908). 


EXERCISES 6.5 


6.5.1. A study was conducted to compare two different procedures for assembling components. 
Both procedures were implemented and run for a month to allow employees to learn each 
procedure. Then each was observed for 10 days with the following results. Values are number 
of components assembled per day. 


ProcedureI 115 101 113 64 104 97 114 96 87 93 
Procedure II 86 99 100 78 97 111 102 94 88 99 


Construct a 98% confidence interval for the difference in the mean number of compo- 
nents assembled by the two methods. Assume that the data for each procedure are from 
approximately normal populations with a common variance. Interpret the result. 


6.5.2. A study was conducted to see the differences between oxygen consumption rates for male 
runners from a college who had been trained by two different methods, one involving 
continuous training for a period of time each day and the other involving intermittent 
training of about the same overall duration. The means, standard deviations, and sample 
sizes are shown in the following table. 


Continuous training ny=15 *X,=46.28 s,;=6.3 


Intermittent training ng=7 X2=42.34 s27=7.8 


If the measurements are assumed to come from normally distributed populations with 
equal variances, estimate the difference between the population means, with confidence 
coefficient 0.95, and interpret. 


6.5.3. Studies have shown that the risk of developing coronary disease increases with the level of 
obesity. A study comparing two methods of losing weight: diet alone and exercise alone 
were conducted on 82 men over 1-year period. Forty-two men dieted and lost an average of 
16.0 lb over the year, with a standard deviation of 5.6 lb. Forty-five men who exercised lost an 
average of 10.6 lb, with a standard deviation of 7.9 lb. Construct a 99% confidence interval 
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for the difference in the mean weight loss by these two methods. State any assumptions you 
made and interpret the result you obtained. 


6.5.4. The following information was obtained from two independent samples selected from two 
normally distributed populations with unknown but equal variances. 


Sample 1 14 15 12 13 6 14 11 12 17 19 23 
Sample 2 16 18 12 20 15 19 15 22 20 18 23 12 20 


Construct a 95% confidence interval for the difference between the population means and 
interpret. 


6.5.5. Inthe academic year 2001-2002, two random samples of 25 male professors and 23 female 
professors from a large university produced a mean salary for male professors of $58,550 
with a standard deviation of $4000; the mean for female professors was $53,700 with a 
standard deviation of 3200. Construct a 90% confidence interval for the difference between 
the population mean salaries. Assume that the salaries of male and female professors are 
both normally distributed with equal standard deviations. Interpret the result. 


6.5.6. Let the random variables X; and X> follow binomial distributions that have parameters 
ny, = 100, nz =75, Let x} = 35 and x2 = 27 be observed values of X; and X>. Let p; and p2 
be the true proportions. Determine an appropriate 95% confidence interval for p; — po. 


6.5.7. The following information is obtained from two independent samples selected from two 
populations. 


ny =40 X1=28.4 5, =4.1 
nz = 32 X2=25.6 s2=4.5 


(a) What is the maximum likelihood estimator of 41 — 2? 
(b) Construct a 99% confidence interval for w1 — 42. 


6.5.8. In order to compare the mean hemoglobin (Hb) levels of well-nourished and undernour- 
ished groups of children, random samples from each of these groups yielded the following 


summary. 
Number of | Sample | Sample Standard 
Children Mean Deviation 
Well nourished 95 11.2 0.9 
Undernourished 75 9.8 12: 


Construct a 95% confidence interval for the true difference of means, (41 — [/2. 


6.5.9. Ina certain part of a city, the average price of homes in 2000 was $148,822, and in 2001 
it was $155,908. Suppose these means were based on a random sample of 100 homes in 
1997 and 150 homes in 1998 and that the sample standard deviations of sale prices were 
$21,000 for 2000 and $23,000 for 2001. Find a 98% confidence interval for the difference 
in the two population means. 
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6.5.10. Two independent samples from a normal population are taken with the following summary 
statistics: 


nj=16 xX, =2.4 5, =0.1 
na=11 X72 =2.6 So =0.5 
Construct a 95% confidence interval for 07 /o%. 


6.5.11. The following information was obtained from two independent samples selected from two 
normally distributed populations. 


Sample 1 | 35|36]|33)| 34] 27/35 |32|33|38|40| 44 
Sample 2 | 37|39]|33) 41 | 36/40 |36| 43 | 41 |39]|44| 33) 41 


Construct a 90% confidence interval for 7/04. 


6.5.12. The management of a supermarket wanted to study the spending habits of its male and 
female customers. A random sample of 16 male customers who shopped at this supermarket 
showed that they spent an average of $55 with a standard deviation of $12. Another random 
sample of 25 female customers showed that they spent $85 with a standard deviation of 
$20.50. Assuming that the amounts spent at this supermarket by all its male and female 
customers were approximately normally distributed, construct a 90% confidence interval 
for the ratio of variance in spending for males and females, 07 /o%. 


6.5.13. Anexperiment is conducted comparing the effectiveness of anew method of teaching algebra 
for eighth-grade students. Twelve gifted and 12 regular students are taught using this method. 
Their scores on a final exam are shown in the following table. 


Average | 58) 69/55] 65 | 88/52|99| 76/45) 86] 55 | 79 
Gifted | 77 | 86 | 84 | 93] 77 | 91 | 87/95 | 68) 78| 74] 58 


(a) Compute the 95% confidence interval on the difference between the mean of the 
students being taught by this new method. 

(b) Construct a 95% confidence interval for the ratio of variance in test scores for regular 
and gifted students, 07 /o%. 

(c) What are the assumptions you made in parts (a) and (b)? Are these assumptions 
justified? 


6.5.14. Assume that two populations have the same variance o. If a sample of size n produced 
a variance Sj from population I and a sample of size nz produced a variance S3 from 
population II, show that the pooled variance 


@ wis 1)S? + (nz — 1)S5 
Pp nytnzg—2 


is an unbiased estimator of o*. Show that (S? + S3)/2 is also an unbiased estimator of 07. 
Which of the two estimators would you prefer? Give reasons for your choice. 
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6.6 CHAPTER SUMMARY 


This chapter discusses the concept of interval estimation. A (1 — w)100% confidence interval (CI) for 
an unknown parameter 6 is computed from sample data. The so-called pivotal method is introduced 
for deriving a confidence interval. Large sample and small sample confidence intervals are derived for 
population mean j. Confidence intervals in the case of two samples are also discussed. Additionally, 
confidence intervals for variance and ratio of variances are derived. 


The following list gives some of the key definitions introduced in this chapter. 


= Upper and lower confidence limits 
m Confidence coefficient 

mw 100(1 — a)% confidence interval for 6 
a Interval estimation 

= Confidence interval 


The following important concepts and procedures are discussed in this chapter. 


Pivotal method 

Procedure to find a confidence interval for 6 using the pivot 

Procedure to find a large sample confidence interval for @ 

Procedure to find a small sample confidence interval for 

Procedure to find a confidence interval for the population variance o?. 
Large sample confidence interval for the difference of the means 

Small sample confidence interval for the difference of two means (07 = 03) 
Small sample confidence interval for the difference of two means (a7 4 03) 
Large sample confidence interval for p; — p2 


2. 


i | 
i | 
t_| 
i | 
i 
i | 
i | 
i | 
i | 
a A (1 —a)100% confidence interval for oa /0%5 


6.7 COMPUTER EXAMPLES 


6.7.1 Minitab Examples 


2 _>?AHA}?D?ApA\pARAA$A A 


Example 6.7.1 
(Small Sample): Using Minitab, obtain a 95% confidence interval for ;z using the following data 


7.227 5.7383 4.9369 6.238 8.4876 2.7618 


Solution 
Use the following commands. 
Enter the data in C1. Then 


Stat > Basic Statistics > 1-sample t... , in variables: enter C1, click Confidence interval, in Level 
default value is 95, if any other value, enter that value, and click OK 
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We will obtain the following output. 
T Confidence Intervals 
Variable N Mean StDev SE Mean 95.0% Cll. 
Cl 6 5898 1.968 0.804 (3.832, 7.964) 


ooo, 


Example 6.7.2 
(Large Sample): For the data 


68 56 85 85 84 75 93 94 78 7.1 99 

96 90 13.7 94 166 9.1 10.1 10.66 11.1 89 11.7 
12.8 11.5 10.6 12.0 11.1 64 12.3 12.3 114 9.9 15.5 
14.3 11.5 13.3 11.8 12.8 13.7 13.9 12.9 14.2 14.0 


obtain a 98% confidence interval for ju. 


Solution 
Enter the data in C1. Then click 


Stat > Basic Statistics > 1-Sample Z... > in Variables: type C1 > click Confidence interval, and 
enter 98 in Level: > enter 5 in Sigma: > OK 


We will obtain the following output. 


THE ASSUMED SIGMA = 5.00 
Variable N MEAN STDEV SEMEAN 98.0 PERCENT Cl. 
Cl 49 12.124 4.700 0.714 (10.462, 13.787) 


$$ AAAS ANS Sa ASpq Aaa 


Example 6.7.3 
For the following data, find a 90% confidence interval for 44 — 42 


Sample 1 | 1.2 | 3.1 | 1.7 | 2.8 | 3.0 
Sample 2 | 4.2 | 2.7 | 3.6 | 3.9 


Solution 
Enter sample 1 in C1 and sample 2 in C2. Then click 


Stat > Basic Statistics > 2-Sample t... > click Sample in different columns > in First: enter C7 and 
in Second: enter C2 > enter 90 in Confidence Level: (if equality of variance can be assumed, click 
Assume equal variances) > OK 
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We will obtain the following output: 
TWOSAMPLE T FOR C1 VS C2 


N MEAN STDEV SE MEAN 
C1 5 2.360 0.856 0.38 
C2 4 3.00 0.648 0.32 


90 PCT Cl FOR MU C1 — MU C2: (—2.22, —0.26) 
TTEST MU C1 = MU C2 (VS NE): T = —2.39 P = 0.048 DF = 7 
POOLED STDEV = 0.774 


6.7.2 SPSS Examples 


—OOoOoOoooQeeQ@eeeeeeeeeeeeeeeeeeeeeee———————————————— 
Example 6.7.4 
Consider the data 


66 74 79 80 77 78 65 79 81 69 


Using SPSS, obtain a 99% confidence interval for ju. 
Solution 


One easy way to obtain the confidence interval in SPSS is to use the hypothesis testing procedure. The 
procedure is as follows: First enter the data in C1. Then click 


Analyze > Compare Means > One-sample t Test... , > Move var00001 to Test Variable(s), and 
Click Options... , and enter 99 in Confidence interval:, click Continue, and OK 


Note that the default value is 95%. 
We will obtain the following output: 


One-Sample Statistics 


Std. error 
N | Mean | Std. deviation mean 


VAROOO01 | 10 | 74.8000 5.99630 1.89620 


One-Sample Test 
Test Value = 0 


99% Confidence 
interval of the 


Mean difference 
t df | Sig.(2-tailed) | difference | Lower | Upper 
VAROOO00 | 39.447 | 9 .000 74.8000 | 68.6377 | 80.9623 


From this, we obtain the 99% confidence interval as (68.6377, 80.9623). 
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6.7.3 SAS Examples 


a 
Example 6.7.5 


The following data give P/E for a particular year of 49 mutual fund companies owned by a randomly 
selected mutual fund. 


6.8 5.6 8.5 8.5 8.4 75 9.3 9.4 78 7.1 

9.9 9.6 9.0 16.6 9.1 10.1 106 11.1 8.9 11.7 
128 115 12.0 106 = 11.1 64 114 99 143 11.5 
118 133 13.9 12.9 142 140 155 179 218 18.4 
34.3 13.7 12.33 18.0 94 123 169 12.8 13.7 


Find a 98% confidence interval for the mean P/E multiples. Use SAS procedures. 


Solution 
We could use the following procedure. 


DATA peratio; 

INPUT patio @@; 

DATALINES; 

68 56 85 85 84 75 93 94 78 

71 99 96 90 94 13.7 166 9.1 10.1 10.6 
11.1 89 11.7 128 11.5 120 106 11.1 64 123 
12.3 114 99 143 11.5 11.8 13.3 128 13.7 13.9 12.9 
14.2 140 15.5 169 180 17.9 218 184 34.3 


PROC MEANS data = peratio Iclm uclm alpha = 0.02; 
var peratio; 
RUN; 


We will obtain the following output: 


The MEANS Procedure 
Analysis Variable : peratio 


Lower 98% Upper 98% 
CL for Mean CL for Mean 


10.5084971 13.7404825 


Hence, we will obtain the 98% confidence interval for the P/E ratios as (10.50, 13.74). 


EXERCISES 6.7 


6.7.1. Using any of the software packages (Minitab, SPSS, or SAS), obtain confidence intervals for 
at least one data set taken from each section of this chapter. 
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PROJECTS FOR CHAPTER 6 
6A. Simulation of Coverage of the Small Confidence Intervals for jw 


(a) Generate 25 samples of size 15 from a normal population with = 10 and o* = 4. Using 
a statistical package (such as Minitab), compute the 95% confidence intervals for each of 
the samples using the small sample formula. From your output, determine the proportion 
of the 25 intervals that cover the true mean p = 10. 

(b) What would you expect if the sample size is increased to 100? Would the width of the interval 
increase or decrease? Would you expect more or fewer of these intervals to contain the true 
mean 10? Check your answers with actual computation. 

(c) Repeat with 20 samples of size 10. 


6B. Confidence Intervals Based on Sampling Distributions 


If we want to obtain a (1 — a)100% confidence interval for 0, begin with an estimator 6 of 6 and 
determine its sampling distribution. Now select two probability levels, a1 and 2, so thata = a1+a2. 
Generally we let a; = a. Take a sample and calculate the value of 6, say 9 = k. Now we need to 
determine the values of the upper and lower confidence limits. Find a value 0; such that 


p@z=kh=a 
and 6y such that 
p@<k) =a. 
Then a (1 — w)100% confidence interval for 6 will be 
6, <0 <9Oy. 


(a) Let X,,...,X, be a random sample from U(0, 9) distribution. Obtain a (1 — a)100% 
confidence interval for 0, using the method of sampling distribution. 

(b) Let X havea binomial distribution with parameters n and p. First show that there is no quan- 
tity that satisfies the conditions of a pivotal quantity. Then using the method of sampling 
distributions, obtain a (1 — @)100% confidence interval for p. 


6C. Large Sample Confidence Intervals: General Case 


The method of finding a confidence interval for a parameter 6 that we described in this chapter depends 
on our ability to find the pivotal quantity. We have seen that such a quantity may not exist. In those 
cases, the method of sampling distribution described in the previous project could be used. However, 
this method can involve some difficult calculations. For large samples, we can utilize the following 
procedure, which is based on the asymptotic distribution of maximum likelihood estimators. Under 
fairly general conditions, the maximum likelihood estimators have a limiting distribution that is 
normal. Also, maximum likelihood estimators are asymptotically efficient. Hence, for a large sample 
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the maximum likelihood estimator 6 of 6 will have approximately normal distribution with mean 6. 
Also, if the Cramér-Rao lower bound exists, the limiting variance of 6 will be 


1 

2 
Ce —— 

6 2 

alnL 
e| (284) 
Hence, 
6-6 
i= ~ N(O, 1). 
6 


Then a large sample (1 — a) 100% confidence interval is obtained from the probability statement 
6-6 
P (car < — < ea] x~l-a. 
% 


We summarize the procedure to construct large sample confidence intervals. 


1. Determine the maximum likelihood estimator, 6, of 6. Also find the maximum likelihood 
estimators of all other unknown parameters. 

2. Obtain the variance oy (if possible directly, otherwise by using the Cramér—-Rao lower bound). 

3. In the expression for og, substitute 6 for 6. Replace all other unknown parameters by its 
maximum likelihood estimators. Let the resulting quantity be denoted by sg. 

4. Now construct a (1 — w)100% confidence interval for 6 from 


6- Za/253 < 0<O+ Za /259- 


(a) Using the foregoing procedure, show that a large sample (1 — w)100% confidence interval 
for the parameter p in a binomial distribution based on 2 trials is 


" pd — Pp) * pd — Pp) 
P— 2a/2 po PSP eal? 4 


(b) Let X,,..., X, bearandom sample from a normal population with parameters ju and o?. 
Derive a large sample confidence interval for o? using the above procedure. 
(c) Let X1,..., X, bea random sample from a population with a pdf 


so, x>0 
f@M= 


0, otherwise. 


Derive a large sample confidence interval for 0. 
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6D. Prediction Interval for an Observation from a Normal Population 


In many cases, we may be interested in predicting future observations from a population, rather than 
making an inference. A (1 — «)100% prediction interval for a future observation X is an interval of the 
form (X,, Xy) such that p(X, < X < Xy) = 1— a. Similarly to confidence intervals, we can also 
define one-sided prediction intervals. Assume that the population is normal with known variance 
o?. Let Xi,...,X, be a random sample from this population. Then the sampling distribution of 
the difference X — X (we use X to denote X,,) is normal with mean zero and variance o* + oF = 
(14+ (1/n))o?. Then a (1 — a) 100% prediction interval for X is given by 


= 1 = 1 
(X- <n (1+ 2) 0%, X + 20/2 [(s+2)qa) 


Thus, we are (1 — w)100% confident that the next observation, X,4+1, will lie in this interval. As in 
confidence intervals, if the sample size is large, replace o by sample standard deviation s. 


In case, where both and o are not known, and the sample size is small (so that the Central Limit 
Theorem cannot be applied), it can be shown that [(Xn41 — Xn)/(Sn/I +0 7n))| has a f-distribution 
with (n — 1) degrees of freedom. Thus, a (1 — @)100% prediction interval for X,+, is given by 


(x — la/2,n-1y (1+ (1/n))S?, X+ te/2,n-1V (1+ (/mps?), 


A standard measure of the capacity of lungs to expel air in breathing is called forced expiratory 
volume (FEV). The FEV1 is the volume exhaled during the first second of a forced expiratory maneuver 
started from the level of total lung capacity. The following data (source: M. Bland, An Introduction 
to Medical Statistics, Oxford University Press, 1995) represents FEV measurements (in liters) from 
57 male medical students. 


4.47 3.10 4.50 4.90 3.50 4.14 4.32 4.80 3.10 4.68 
4.47 3.57 2.85 5.10 5.20 4.80 5.10 4.30 4.70 4.08 
3.48 4.20 3.70 5.30 4.71 4.10 4.30 3.39 3.69 4.44 
5.00 4.50 4.20 4.16 3.70 3.83 3.90 4.47 3.30 5.43 
3.42 3.60 3.20 4.56 4.78 3.60 3.96 3.19 2.85 3.04 
3.78 3.75 4.05 3.54 4.14 2.98 3.54 


Obtain a 95% prediction interval for a future observation Xn4+1. 


Chapter 


Hypothesis Testing 


Objective: In this chapter, various methods of testing hypotheses will be discussed. 
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Jerzy Neyman 


(Source: http://sciencematters. berkeley.edu/archives/volume2/issue12/legacy.php) 


Jerzy Neyman (1894-1981) made far-reaching contributions in hypothesis testing, confidence inter- 
vals, probability theory, and other areas of mathematical statistics. His work with Egon Pearson gave 
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logical foundation and mathematical rigor to the theory of hypothesis testing. Their ideas made sure 
that samples were large enough to avoid false representation. Neyman made a broader impact in 
statistics throughout his lifetime. 


7.1 INTRODUCTION 


Statistics plays an important role in decision making. In statistics, one utilizes random samples to 
make inferences about the population from which the samples were obtained. Statistical inference 
regarding population parameters takes two forms: estimation and hypothesis testing, although both 
hypothesis testing and estimation may be viewed as different aspects of the same general problem of 
arriving at decisions on the basis of observed data. We already saw several estimation procedures in 
earlier chapters. Hypothesis testing is the subject of this chapter. Hypothesis testing has an important 
role in the application of statistics to real-life problems. Here we utilize the sampled data to make 
decisions concerning the unknown distribution of a population or its parameters. Pioneering work 
on the explicit formulation as well as the fundamental concepts of the theory of hypothesis testing 
are due to J. Neyman and E. S. Pearson. 


A statistical hypothesis is a statement concerning the probability distribution of a random variable 
or population parameters that are inherent in a probability distribution. The following example 
illustrates the concept of hypothesis testing. An important industrial problem is that of accepting or 
rejecting lots of manufactured products. Before releasing each lot for the consumer, the manufacturer 
usually performs some tests to determine whether the lot conforms to acceptable standards. Let us 
say that both the manufacturer and the consumer agree that if the proportion of defectives in a lot is 
less than or equal to a certain number p, the lot will be released. Very often, instead of testing every 
item in the lot, we may test only a few items chosen at random from the lot and make decisions 
about the proportion of defectives in the lot; that is, we make the decisions about the population 
on the basis of sample information. Such decisions are called statistical decisions. In attempting to 
reach decisions, it is useful to make some initial conjectures about the population involved. Such 
conjectures are called statistical hypotheses. Sometimes the results from the sample may be markedly 
different from those expected under the hypothesis. Then we can say that the observed differences 
are significant and we would be inclined to reject the initial hypothesis. These procedures that enable 
us to decide whether to accept or reject hypotheses or to determine whether observed samples differ 
significantly from expected results are called tests of hypotheses, tests of significance, or rules of decision. 


In any hypothesis testing problem, we formulate a null hypothesis and an alternative hypothesis such that 
if we reject the null, then we have to accept the alternative. The null hypothesis usually is a statement 
of either the “status quo” or “no effect.” A guideline for selecting a null hypothesis is that when the 
objective of an experiment is to establish a claim, the nullification of the claim should be taken as 
the null hypothesis. The experiment is often performed to determine whether the null hypothesis is 
false. For example, suppose the prosecution wants to establish that a certain person is guilty. The null 
hypothesis would be that the person is innocent and the alternative would be that the person is guilty. 
Thus, the claim itself becomes the alternative hypothesis. Customarily, the alternative hypothesis is 
the statement that the experimenter believes to be true. For example, the alternative hypothesis is 
the reason a person is arrested (police suspect the person is not innocent). Once the hypotheses 
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have been stated, appropriate statistical procedures are used to determine whether to reject the null 
hypothesis. For the testing procedure, one begins with the assumption that the null hypothesis is true. 
If the information furnished by the sampled data strongly contradicts (beyond a reasonable doubt) 
the null hypothesis, then we reject it in favor of the alternative hypothesis. If we do not reject the 
null, then we automatically reject the alternative. Note that we always make a decision with respect 
to the null hypothesis. Note that the failure to reject the null hypothesis does not necessarily mean 
that the null hypothesis is true. For example, a person being judged “not guilty” does not mean the 
person is innocent. This basically means that there is not enough evidence to reject the null hypothesis 
(presumption of innocence) beyond “a reasonable doubt.” 


We summarize the elements of a statistical hypothesis in the following. 


THE ELEMENTS OF A STATISTICAL HYPOTHESIS 

1. The null hypothesis, denoted by Ho, is usually the nullification of a claim. Unless evidence from the 
data indicates otherwise, the null hypothesis is assumed to be true. 

2. The alternate hypothesis, denoted by Hg (or sometimes denoted by H;), is customarily the claim 
itself. 

3. The test statistic, denoted by TS, is a function of the sample measurements upon which the 
statistical decision, to reject or not reject the null hypothesis, will be based. 

4. A rejection region (or a critical region) is the region (denoted by RR) that specifies the values 
of the observed test statistic for which the null hypothesis will be rejected. This is the range of 
values of the test statistic that corresponds to the rejection of Ho at some fixed level of significance, 
a, which will be explained later. 

5. Conclusion: If the value of the observed test statistic falls in the rejection region, the null hypothesis 
is rejected and we will conclude that there is enough evidence to decide that the alternative 
hypothesis is true. If the TS does not fall in the rejection region, we conclude that we cannot reject 
the null hypothesis. 


In practice one may have hypotheses such as Ho : 4 = 0 against one of the following alternatives: 


Ha: #4 Uo, called a two-tailed alternative 
or Ha: <o, called a lower (or left) tailed alternative 
or Hqa:u> mo, called an upper (or right) tailed alternative 


A test with a lower or upper tailed alternative is called a one-tailed test. In an applied hypothesis testing 
problem, we can use the following general steps. 


GENERAL METHOD FOR HYPOTHESIS TESTING 

1. From the (word) problem, determine the appropriate null hypothesis, Ho, and the alternative, Hg. 
Identify the appropriate test statistics and calculate the observed test statistic from the data. 
Find the rejection region by looking up the critical value in the appropriate table. 
Draw the conclusion: Reject or fail to reject the null hypothesis, Ho. 
Interpret the results: State in words what the conclusion means to the problem we started with. 


ubwn 
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It is always necessary to state a null and an alternate hypothesis for every statistical test performed. 
All possible outcomes should be accounted for by the two hypotheses. 


Ecc $9: ee 
Example 7.1.1 
In a coin-tossing experiment, let p be the probability of heads. We start with the claim that the coin is fair, 
that is, Ho: p = 1/2. We test this against one of the following alternatives: 


(a) Hg: The coin is not fair (p 4 1/2). This is a two-tailed alternative. 
(b) Hq: The coin is biased in favor of heads (p > 1/2). This is an upper tailed alternative. 


(c) Hq: The coin is biased in favor of tails (p < 1/2). This is a lower tailed alternative. 
= 


It is important to observe that the test statistic is a function of a random sample. Thus, the test statistic 
itself is a random variable whose distribution is known under the null hypothesis. The value of a test 
statistic when specific sample values are substituted is called the observed test statistic or simply test 
statistic. 


For example consider the hypothesis Ho : 4 = fo versus Hy : 4 A Mo, where fo is known. Assume 
that the population is normal with a known variance o?. Consider X, an unbiased estimator of 
based on the random sample X1,..., Xn. Then Z=(X — 9) /(o/./n) is a function of the random 


sample X1,..., X,, and has a known distribution, a standard normal, under Ab. If x1, x2, ..., X, are 
specific sample values, then z = (X — 110) /(a/./N) is called the observed sample statistic or simply sample 
statistic. 


Definition 7.1.1 A hypothesis is said to be a simple hypothesis if that hypothesis uniquely specifies 
the distribution from which the sample is taken. Any hypothesis that is not simple is called a composite 
hypothesis. 


a 


Example 7.1.2 
Refer to Example 7.1.1. The null hypothesis p = 1/2 is simple, because the hypothesis completely specifies 
the distribution, which in this case will be a binomial with p = 1/2 and with n being the number of tosses. 
The alternative hypothesis p 4 1/2 is composite because the distribution now is not completely specified 
(we do not know the exact value of p). 

= 


Because the decision is based on the sample information, we are prone to commit errors. Ina statistical 
test, it is impossible to establish the truth of a hypothesis with 100% certainty. There are two possible 
types of errors. On the one hand, one can make an error by rejecting Ho when in fact it is true. On 
the other hand, one can also make an error by failing to reject the null hypothesis when in fact it is 
false. Because the errors arise as a result of wrong decisions, and the decisions themselves are based 
on random samples, it follows that the errors have probabilities associated with them. We now have 
the following definitions. 
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Table 7.1 Statistical Decision and Error Probabilities 


Statistical True state of null hypothesis 
decision Ho true Ho false 


Do not reject Ho Correct decision Type ll error (8) 


Reject Ho Type | error (a) Correct decision 


The decision and the errors are represented in Table 7.1. 


Definition 7.1.2 (a) A type I error is made if Ho is rejected when in fact Ho is true. The probability of 
type I error is denoted by a. That is, 


P (rejecting Ho|Ho is true) = a. 


The probability of type I error, a, is called the level of significance. 


(b) A type II error is made if Ho is accepted when in fact H, is true. The probability of a type II error is 
denoted by B. That is, 


P (not rejecting Ho| Ho is false) = Bp. 


It is desirable that a test should have a = 6 = 0 (this can be achieved only in trivial cases), or at least 
we prefer to use a test that minimizes both types of errors. Unfortunately, it so happens that for a 
fixed sample size, as a decreases, 6 tends to increase and vice versa. There are no hard and fast rules 
that can be used to make the choice of a and £. This decision must be made for each problem based 
on quality and economic considerations. However, in many situations it is possible to determine 
which of the two errors is more serious. It should be noted that a type II error is only an error in 
the sense that a chance to correctly reject the null hypothesis was lost. It is not an error in the sense 
that an incorrect conclusion was drawn, because no conclusion is made when the null hypothesis is 
not rejected. In the case of type I error, a conclusion is drawn that the null hypothesis is false when, 
in fact, it is true. Therefore, type I errors are generally considered more serious than type II errors. 
For example, it is mostly agreed that finding an innocent person guilty is a more serious error than 
finding a guilty person innocent. Here, the null hypothesis is that the person is innocent, and the 


Prob (TYPE II Error) = Beta 


Under Ho PE | Error) = Alpha 


Under H, 


Critical value 
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alternate hypothesis is that the person is guilty. “Not rejecting the null hypothesis” is equivalent to 
acquitting a defendant. It does not prove that the null hypothesis is true, or that the defendant is 
innocent. In statistical testing, the significance level a is the probability of wrongly rejecting the null 
hypothesis when it is true (that is, the risk of finding an innocent person guilty). Here the type II risk 
is acquitting a guilty defendant. The usual approach to hypothesis testing is to find a test procedure 
that limits a, the probability of type I error, to an acceptable level while trying to lower 8 as much as 
possible. 


The consequences of different types of errors are, in general, very different. For example, if a doctor 
tests for the presence of a certain illness, incorrectly diagnosing the presence of the disease (type I 
error) will cause a waste of resources, not to mention the mental agony to the patient. On the other 
hand, failure to determine the presence of the disease (type II error) can lead to a serious health risk. 


To formulate a hypothesis testing problem, consider the following situation. Suppose a toy store 
chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys. We feel 
that this claim is inflated. In an attempt to dispose of this claim, we observe the buying pattern of 20 
randomly selected girls under 8 years old, and we observe X, the number of girls under 8 years old 
who buy stuffed toys or dolls. Now the question is, how can we use X to confirm or reject the store’s 
claim? Let p be the probability that a girl under 8 chosen at random prefers stuffed toys or dolls. The 
question now can be reformulated as a hypothesis testing problem. Is p> 0.8 or p < 0.8? Because we 
would like to reject the store’s claim only if we are highly certain of our decision, we should choose 
the null hypothesis to be Ho: p > 0.8, the rejection of which is considered to be more serious. The 
null hypothesis should be Ho: p> 0.8, and the alternative H, : p < 0.8. In order to make the null 
hypothesis simple, we will use Ho : p = 0.8, which is the boundary value with the understanding that 
it really represents Hp : p> 0.8. We note that X, the number of girls under 8 years old who prefer 
stuffed toys or dolls, is a binomial random variable. Clearly a large sample value of X would favor 
Ho. Suppose we arbitrarily choose to accept the null hypothesis if X > 12. Because our decision is 
based on only a sample of 20 girls under 8, there is always a possibility of making errors whether 
we accept or reject the store chain’s claim. In the following example, we will now formally state this 
problem and calculate the error probabilities based on our decision rule. 


e—_—_,hmv_aoTaX—«x—«a«a«a«a«a Ce=eeeeee——— 

Example 7.1.3 
A toy store chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys. 
After observing the buying pattern of many girls under 8 years old, we feel that this claim is inflated. In an 
attempt to dispose of this claim, we observe the buying pattern of 20 randomly selected girls under 8 years 
old, and we observe X, the number of girls who buy stuffed toys or dolls. We wish to test the hypothesis 
Ho : p = 0.8 against Hy : p < 0.8. Suppose we decide to accept the Ho if X > 12 (that is X > 13). This 
means that if {X < 12} (thatis X < 13) we will reject Ho. 

(a) Finda. 

(b) Find 6 for p = 0.6. 

(c) Find £ for p = 0.4. 

(d) Find the rejection region of the form {X < K}so that (i) a = 0.01; (ii) « = 0.05. 

(e) For the alternative H,: p = 0.6, find 6 for the values of a in part (d). 
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Solution 
The TS X is the number of girls under 8 years old who buy dolls. X follows the binomial distribution with 
n= 20 and p, the unknown population proportion of girls under 8 who prefer dolls. We now calculate a 
and B. 
(a) For p = 0.8, the probability of type | error is 
a = P{reject Ho| Ho is true} 


= P{X < 12|p = 0.8} 


12 
=> ce) (0.8)*(0.2)29-* 
x 
x=0 
= 0.0321. 


If we calculate w for any other value of p > 0.8, then we will find that it is smaller than 0.0321. 
Hence, there is at most a 3.21% chance of rejecting a true null hypothesis. That is, if the store’s claim 
is in fact true, then the chance that our test will erroneously reject that claim is at most 3.21%. 

(b) Here p = 0.6. The probability of type Il error is 


B = P{accept Ho|Ho false} 
= P{X > 12|p = 0.6} 
= 1-— P{X < 12|p = 0.6} 
= 1- 0.584 
= 0.416 


so there is a 4.2% chance of accepting a false null hypothesis. Thus, in case the store’s claim is not 

true, and the truth is that only 60% of girls under 8 years old prefer dolls over other types of toys, 

then there is a 4.2% chance that our test will erroneously conclude that the store’s claim is true. 
(c) If p= 0.4, then 


B = P{accept Ho|Ho false} 
= P{X > 12|p = 0.4} 
= 1— P{X < 12|p=0.4} 
= 1-0.979 
= 0.021. 


That is, there is a 2.1% chance of accepting a false null hypothesis. 
(d) (i) To find K such that 
a= P{X < K|p=0.8} = 0.01 


from the binomial table, K = 11. Hence, the rejection region is: Reject Hg if {X < 11}. 
(ii) To find K such that 


a = P{X < K|p=0.8} = 0.05 
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from the binomial table, a= 0.05 falls between K =12 and K = 13. However, for K = 13, the 
value for a is 0.087, exceeding 0.05. If we want to limit a to be no more than 0.05, we will 
have to take K =12. That is, we reject the null hypothesis if X < 12, yielding an a=0.0321 
as shown in (a). 

(e) (i) When a = 0.01, from (d), the rejection region is of the form {X < 11}. For p = 0.6, 


B = P{accept Ho|Ho false} 
= P{Y > 1|p=0.6} 
=1- P{Y < ll|p=0.6} 
= 1-0.404 
= 0.596. 


(ii) From (a) and (b) for testing the hypothesis Ho : p = 0.8 against Ha : p < 0.8 with n = 20. 
We see that when a is 0.0321, B is 0.416. From (d)(i) and (e)(i) for the same hypothesis, we 
see that when a is 0.01, B is 0.596. This holds in general. Thus, we observe that for fixed n as 
a decreases, B increases and vice versa. 

[| 


In the next example, we explore what happens to £ as the sample size increases, with a fixed. 


__—__AADVaJaA>y} i 


Example 7.1.4 
Let X be a binomial random variable. We wish to test the hypothesis Hg : p= 0.8 against Hy : p=0.6. Let 
a = 0.03 be fixed. Find 6 forn = 10 andn = 20. 


Solution 
For n = 10, using the binomial tables, we obtain P{X < 5|p = 0.8} = 0.03. Hence the rejection region for 
the hypothesis Ho : p = 0.8 vs. Ha : p = 0.6 is given by reject Ho if X <5. The probability of type Il error is 


B = P{accept Ho|p = 0.6} 
B= P{X > 5|p = 0.6} = 1— P{X < 5|p = 0.6} = 0.733. 


For n = 20, as shown in Example 7.1.3, if we reject Hg for X < 12, we obtain 
P(X < 12|p = 0.8) = 0.03 


and 
B= P(X > 12|p = 0.6) = 1 — P{X < 12|p = 0.6} = 0.416. 


We see that for a fixed a, as n increases B decreases and vice versa. It can be shown that this result holds in 


general. 
= 


7.1 Introduction 345 


In order for us to compute the value of 8, it is necessary that the alternate hypothesis is simple. Now 
we will discuss a three-step procedure to calculate £. 


STEPS TO CALCULATE 8 
1. Decide an appropriate test statistic (usually this is a sufficient statistic or an estimator for the 
unknown parameter, whose distribution is known under Ho). 
2. Determine the rejection region using a given a, and the distribution of the test statistic (TS). 
3. Find the probability that the observed test statistic does not fall in the rejection region assuming 
Hg is true. This gives B. That is, 


B = P(T.S. falls in the complement of the rejection region|His true). 


SEE Orr ——_——_—_—_—_—_—_—_—_—_—_—_—_—_———————————————————————————————_—_—_—_—_—_—— 
Example 7.1.5 
A random sample of size 36 from a population with known variance, a2 = 9, yields a sample mean of 
xX = 17. Find £, for testing the hypothesis Ho : 4 = 15 versus Hg : uw = 16. Assume a = 0.05. 


Solution 

Here n = 36,X = 17, and o2 = 9. In general, to test Ho : “ = Lo versus Hg : fh > Lg, we proceed as 
follows. An unbiased estimator of 2 is X. Intuitively we would reject Ho if X is large, say X > c. Now using 
a = 0.05, we will determine the rejection region. By the definition of a, we have 


P(X > clu = Wo) = 0.05 


or 


> 


of /n —— of/n 


X- _ 
( Ho _ ¢ Ha no) = 005 


But if ~= U0, because the sample size n> 30, [(X — 119)/(o//n)] ~ N(O, 1). Therefore, P ( X= iH > 


(a/./n) 
c~Ho_ )\ _ ‘ i c—hHo \ _ - 
ois) = 0.05 is equivalent to P (z = a) = 0.05. From standard normal tables, we obtain P (Z > 
1.645) = 0.05. Hence ‘af Ja = 1.645 or c=p19 + 1.645(0/./n). 
Therefore, the rejection region is the set of all sample means X such that 
¥> uo + 1.645(— 
x> F — }. 


Substituting 49 = 15, and o = 3, we obtain 
3 
lo + 1.645(0//n) = 15+ 1.645 (<3) = 15.8225. 


The rejection region is the set of x such that X> 15.8225. 


346 CHAPTER7 Hypothesis Testing 


Then by definition, 


B= P(X < 15.8225 when pw = 16). 
Consequently, for 4 = 16, 


X—-—16  15.8225—16 
B=P < 
( o//n 3//36 
= P(Z < —0.36) 
= 0.3594. 


That is, under the given information, there is a 35.94% chance of not rejecting a false null hypothesis. 
= 


7.1.1 Sample Size 


It is clear from the preceding example that once we are given the sample size n, an a, a simple 
alternative H,, and a test statistic, we have no control over f and it is exactly determined. Hence, for 
a given sample size and test statistic, any effort to lower 6 will lead to an increase in a and vice versa. 
This means that for a test with fixed sample size it is not possible to simultaneously reduce both a 
and £. We also notice from Example 7.1.4 that by increasing the sample size n, we can decrease B 
(for the same q@) to an acceptable level. The following discussion illustrates that it may be possible to 
determine the sample size for a given @ and £. 


Suppose we want to test Hp : uw = Mo versus Hy : & > fo. Given a and B, we want to find n, the 
sample size, and K, the point at which the rejection begins. We know that 


a = P(X > K whenp = 10) 


_ »(X-vo _ K- Ho = 
= »( adh > ia when = Ho) (7.1) 


= P(Z > za) 


and 


B= P(X < K, when pu = pa) 


= o( ae < We when pu = Ha) (7.2) 
= P(z < zg). 
From Equations (7.1) and (7.2), 
a K— uo 
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and 
on _ K-twa 
Pol Jn 


This gives us two equations with two unknowns (K and 7), and we can proceed to solve them. 
Eliminating K, we get 


From this we can derive 


Jae (Za + zp) 
Ha — LO 


Thus, the sample size for an upper tail alternative hypothesis is 


— (Za + zp)707 
(Ha — Ho)? 


The sample size increases with the square of the standard deviation and decreases with the square of 
the difference between mean value of the alternative hypothesis and the mean value under the null 
hypothesis. Note that in real-world problems, care should be taken in the choice of the value of tg 
for the alternative hypothesis. It may be tempting for a researcher to take a large value of jg in order 
to reduce the required sample size. This will seriously affect the accuracy (power) of the test. This 
alternative value must be realistic within the experiment under study. Care should also be taken in 
the choice of the standard deviation o. Using an underestimated value of the standard deviation to 
reduce the sample size will result in inaccurate conclusions similar to overestimating the difference 
of means. Usually, the value of o is estimated using a similar study conducted earlier. The problem 
could be that the previous study may be old and may not represent the new reality. When accuracy is 
important, it may be necessary to conduct a pilot study only to get some idea on the estimate of o. 
Once we determine the necessary sample size, we must devise a procedure by which the appropriate 
data can be randomly obtained. This aspect of the design of experiments is discussed in Chapter 9. 


EEO EO 
Example 7.1.6 
Let o = 3.1 be the true standard deviation of the population from which a random sample is chosen. How 
large should the sample size be for testing Ho : ~ = 5 versus Hg : w = 5.5, in order that a = 0.01 and 
B = 0.052 


Solution 
We are given ug = 5 and fq = 5.5. Also, Ze = 20.01 = 2.33 and zg = 70.05 = 1.645. Hence, the 
sample size 


na kat zp)’o* (2.33 + 1.645)?(3.1)? 


= = 607.37. 
(Ha — 0)? (0.5)? 
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So, n = 608 will provide the desired levels. That is, in order for us to test the foregoing hypothesis, we must 

randomly select 608 observations from the given population. _ 
From a practical standpoint, the researcher typically chooses a, and the sample size 6 is ignored. 
Because a trade-off exists between a and £, choosing a very small value of a will tend to increase 8 in 
a serious way. A general rule of thumb is to pick reasonable values of a, possibly in the 0.05 to 0.10 
range so that 6 will remain reasonably small. 


EXERCISES 7.1 


7.1.1. An appliance manufacturer is considering the purchase of a new machine for stamping out 
sheet metal parts. If zo (given) is the true average of the number of good parts stamped out 
per hour by their old machine and yp is the corresponding true unknown average for the 
new machine, the manufacturer wants to test the null hypothesis 4 = jo versus a suitable 
alternative. What should the alternative be if he does not want to buy the new machine 
unless it is (a) more productive than the old one? (b) At least 20% more productive than the 
old one? 


7.1.2. Formulate an alternative hypothesis for each of the following null hypotheses. 
(a) Ho: Support for a presidential candidate is unchanged after the start of the use of TV 
commercials. 
(b) Ho: The proportion of viewers watching a particular local news channel is less 
than 30%. 
(c) Ho: The median grade point average of undergraduate mathematics majors is 2.9. 


7.1.3. It is suspected that a coin is not balanced (not fair). Let p be the probability of tossing a head. 
To test Ho : p = 0.5 against the alternative hypothesis H, : p > 0.5, a coin is tossed 15 times. 
Let Y equal the number of times a head is observed in the 15 tosses of this coin. Assume the 
rejection region to be {Y > 10}. 
(a) Finda. 
(b) Find 6 for p = 0.7. 
(c) Find £ for p = 0.6. 
(d) Find the rejection region for {Y > K} for a= 0.01, and a = 0.03. 
(e) For the alternative Hy : p = 0.7, find £ for the values of @ given in (d). 


7.1.4. In Exercise 7.1.3: 
(a) Assume that the rejection region is {Y > 8}. Calculate a and £ if p = 0.6. Compare the 
results with the corresponding values obtained in Exercise 7.1.3. (This gives the effect of 
enlarging the rejection region on a and f.) 
(b) Assume that the rejection region is {Y > 8}. Calculate a and £ if p = 0.6 and (i) the coin 
is tossed 20 times, or (ii) the coin is tossed 25 times. (This shows the effect of increasing 
the sample size on a and £ for a fixed rejection region.) 


7.1.5. Suppose we have a random sample of size 25 from a normal population with an unk- 
nown mean yw and a standard deviation of 4. We wish to test the hypothesis Ho : 4. = 10 vs. 
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Ha : w > 10. Let the rejection region be defined by: reject Ho if the sample mean 
X > 11.2. 

(a) Finda. 

(b) Find 6 for Hg: w = 11. 

(c) What should the sample size be if a = 0.01 and 6 = 0.8? 


7.1.6. A process for making steel pipe is under control if the diameter of the pipe has mean 3.0 in. 
with standard deviation of no more than 0.0250 in. To check whether the process is under 
control, a random sample of size n = 30 is taken each day and the null hypothesis 4. = 3.0 
is rejected if X is less than 2.9960 or greater than 3.0040. Find (a) the probability of type I 
error; (b) the probability of type II error when pp = 3.0050 in. Assume o = 0.0250 in. 


7.1.7. A bowl contains 20 balls, of which x are green and the remain- der red. To test Hp: x= 10 
versus H, :x = 15, three balls are selected at random without replacement, and Ho is rejected 
if all three balls are green. Calculate a and £ for this test. 


7.1.8. Suppose we have a sample of size 6 from a population with pdf f(x) = (1/0)e-*/", x > 0,0 > 
0. We wish to test Ho : 6 = 1 vs. Ha : 0 > 1. Let the rejection region be defined by reject Ho if 
a X; > 8. (a) Find a. (b) Find 6 for Hz : @ = 2. 


7.1.9. Leto? = 16 be the variance of a normal population from which a random sample is chosen. 
How large should the sample size be for testing Ho : « = 25 versus Hy, : uw = 24, in order that 
a=0.05 and B = 0.05? 


7.2 THE NEYMAN-PEARSON LEMMA 


In practical hypothesis testing situations, there are typically many tests possible with significance level 
a for a null hypothesis versus alternative hypothesis (see Project 7A). This leads to some important 
questions, such as (1) how to decide on the test statistic and (2) how to know that we selected the best 
rejection region. In this section, we study the answer to these questions using the Neyman-Pearson 
approach. 


Definition 7.2.1 Suppose that W is the test statistic and RR is the rejection region for a test of hypothesis 
concerning the value of a parameter 0. Then the power of the test is the probability that the test rejects Ho 
when the alternative is true. That is, 


a = Power(@) 
= P(W in RR when the parameter value is an alternative 6). 
If Ho : 0 = 6 and H, : 6 4 6, then the power of the test at some 0 = 6; # Op is 
Power(61) = P(reject Ho|@ = 64). 
But, B(0,) = P(accept Ho|6 = 6). Therefore, 
Power(61) = 1 — (6). 


A good test will have high power. 
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Note that the power of a test Hp cannot be found until some true situation Hy, is specified. That is, 
the sampling distribution of the test statistic when H, is true must be known or assumed. Because 
B depends on the alternative hypothesis, which being composite most of the time does not specify 
the distribution of the test statistic, it is important to observe that the experimenter cannot control 
B. For example, the alternative H, : 4 < Wo does not specify the value of jz, as in the case of the null 
hypothesis, Ho : “ = Lo. 


cc = 
Example 7.2.1 
Let X1,..., X, be a random sample from a Poisson distribution with parameter A, that is, the pdf is 
given by f(x) =e7*A*/(x!). Then the hypothesis Ho : A= 1 uniquely specifies the distribution, because 
f(x) =e7!/(x!) and hence is a simple hypothesis. The hypothesis Ha : A > 1 is composite, because f(x) is 
not uniquely determined. 


Definition 7.2.2 A test at a given a of a simple hypothesis Ho versus the simple alternative H, that has 
the largest power among tests with the probability of type I error no larger than the given a is called a most 
powerful test. 


Consider the test of hypothesis Hp : 6 = 00 versus H, : 0 = 61. If @ is fixed, then our interest is to 
make f as small as possible. Because 6 = 1 — Power(6,), by minimizing 6 we would obtain a most 
powerful test. The following result says that among all tests with given probability of type I error, the 
likelihood ratio test given later minimizes the probability of a type II error, in other words, it is most 


powerful. 

Theorem 7.2.1 (Neyman-Pearson Lemma) Suppose that one wants to test a simple hypothesis Ho : 
@ = Oo versus the simple alternative hypothesis H,:@=0, based on a random sample X,,..., Xn froma 
distribution with parameter 0. Let L(6) = L(6; X1,..., Xn) > 0 denote the likelihood of the sample when 


the value of the parameter is 0. If there exist a positive constant K and a subset C of the sample space R" (the 
Euclidean n-space) such that 


L(@0) 
1. <K ere eae 
L@) = for (x1, x2 Xn) 
L(@) ; 7 
2. L(1) > K for (x1, x2,.--,Xn) € C’, where C’ is the complement of C, and 
1 


3. P[(X1,...,Xn) € C;O] =a. 
Then the test with critical region C will be the most powerful test for Ho versus Hy. We call a the size of the 
test and C the best critical region of size a. 
Proof. We prove this theorem for continuous random variables. For discrete random variables, the 
proof is identical with sums replacing the integral. Let S be some region in R”, an n-dimensional 
Euclidean space. For simplicity we will use the following notation: 


from fof LO... amdeidrr, din 
Ss S AY 
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Note that 


P((X1,..., Xn) € C99) = f Forni Bolder... vai 
Cc 


= feo: ender. san 
Cc 


Suppose that there is another critical region, say B, of size less than or equal to a, that is 
J, L(@o) < a. Then 
O< [ +) ~ | L660). because [ +o) = a by assumption 3. 
c B Cc 


Therefore, 


0< [ 2060) - f 1000) 
Cc 


B 
= / L(p) + / L(0o) — / Lp) — / Lo) 
CNB CNB’ CNB C’NB 
7 | L(0o) — / L(Op). 
CNB’ C’NB 


Using assumption 1 of Theorem 7.2.1, KL(61) => L(@o) at each point in the region C and hence in 


CO B’. Thus 
[ Hore / L(@}). 


CNB’ CNB’ 


By assumption 2 of the theorem, KL(6,) < L(69) at each point in C’, and hence in C’ NM B. Thus, 


[ H@or=« / L (01). 


C’NB C’NB 
Therefore, 
0< i L (6) - / L(6) 
CNB’ C’NB 


<K [ ven- / L(1) 
cnB’ C’NB 
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That is, 
0<K [vent / L(61)— / Leiy= / L(61) 
CNB CNB’ CNB C'NB 
-K [ro fren 
C B 
As a result, 
few > [-e. 
C B 


Because this is true for every critical region B of size < a, C is the best critical region of size a, and 
the test with critical region C is the most powerful test of size a. 


When testing two simple hypotheses, the existence of a best critical region is guaranteed by the 
Neyman-Pearson lemma. In addition, the foregoing theorem provides a means for determining 
what the best critical region is. However, it is important to note that Theorem 7.2.1 gives only the 
form of the rejection region; the actual rejection region depends on the specific value of a. 


In real-world situations, we are seldom presented with the problem of testing two simple hypotheses. 
There is no general result in the form of Theorem 7.4.1 for composite hypotheses. However, for 
hypotheses of the form Ho : 6 = 60 versus Hg : 6 > 60, we can take a particular value 6; > 09 and 
then find a most powerful test for Ho : 6 = 09 versus Hz : 0 > 64. If this test (that is, the rejection 
region of the test) does not depend on the particular value 61, then this test is said to be a uniformly 
most powerful test for Hp : 0 = 09 versus Hy : 6 > 9. 


The following example illustrates the use of the Neyman-Pearson lemma. 


Ecc: ee 
Example 7.2.2 
Let X;,..., X, denote an independent random sample from a population with a Poisson distribution with 
mean A. Derive the most powerful test for testing Ho : 4 = 2 versus Hy : A = 1/2. 


Solution 
Recall that the pdf of Poisson variable is 


EM FSOGH0; 1,3, cas 
p@=) * ; 
0, otherwise. 


Thus, the likelihood function is 
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ForX = 2, 


and for X = 1/2, 


Thus, 


which implies 
a (*) <K 


or, taking natural logarithm, 


Xj fac? Sie 
(Din)ina= 5 


Solving for (S* xj) and letting {{In K + (3n/2)]/1n 4} = K", we will reject Ho whenever (xj) < K’. 
|| 


A step-by-step procedure in applying the Neyman-Pearson lemma is now given. 


PROCEDURE FOR APPLYING THE NEYMAN-PEARSON LEMMA 
1. Determine the likelihood functions under both null and alternative hypotheses. 
2. Take the ratio of the two likelihood functions to be less than a constant K. 
3. Simplify the inequality in step 2 to obtain a rejection region. 


Onn 


Example 7.2.3 

Suppose Xj,..., Xn is a random sample from a normal distribution with a known mean of jz and an 
unknown variance of o*. Find the most powerful a-level test for testing Hy : 0? = of versus Hy : 
o? = 07 (oj > 06). Show that this test is equivalent to the x*-test. Is the test uniformly most powerful for 


Hy : 02 > oat 
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Solution 
To test Ho: = o% versus Hq : o> a7. We have 
(xj — w)? 
i ie se 
1 202 
L(o2) = e 0 
o I] / 2m0h 
i=1 0 
_LGi- bu)? 
= se 205 
(V21)"06 
Similarly, 
Loi — 0)? 
1 202 
L(o?) = —————e 1 
7 (V27)" ot 


Therefore, the most powerful test is, reject Ho if, 


2 2)2 
2 a\n | _ i=)" ee iis 
EOD) al I Qozae | w?| 
L(o7) o% 


for some K. 


Taking the natural logarithms, we have 


> 9 
nin( 2!) a 20) uw)? <InK 


or 


pies 
Sei? = rin) ~ in] (3%, -c 


O17 Sg 


To find the rejection region for a fixed value of a, write the region as 


Yi - #)? 


2 
% 


> = =C". 

% 
Note that > (xj — u)*/04 has a x2-distribution with n degrees of freedom. Under the Hg because the same 
rejection region (does not depend upon the specific value of of in the alternative) would be used for any 


ot > Ge the test is uniformly most powerful. 


The foregoing example shows that, in order to test for variance using a sample from a normal 
distribution, we could use the chi-square table to obtain the critical value for the rejection region 
given a. 
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EXERCISES 7.2 


7.2.1. Suppose X1,..., X, isa random sample from a normal distribution with a known variance 
of o* and an unknown mean of j. Find the most powerful a-level test of Ho : u = Lo versus 
Ag : = [a if (a) Ho > Ha, and (b) fa > Lo. 

7.2.2. Show that the most powerful test obtained in Example 7.2.1 is uniformly most powerful for 
testing Ho : « < Wo versus Hy : 4 > fq, but not uniformly most powerful for testing Ho : uw = Lo 
versus Hy: 4 Lo. 


7.2.3. Suppose X),..., X, isarandom sample from a U(0, 6) distribution. Find the most powerful 
a-level test for testing Ho : 0 = 6 versus Hy : 0 = 0), where 09 < 0}. 


7.2.4. Let X1,..., X, bearandom sample from a geometric distribution with parameter p. Find the 
most powerful test of Hp : p = po versus Hy : p = Pa(> po). Is this uniformly most powerful 
test for Ho : p = po versus Hy : p> po? 


7.2.5. Let X),...,X, be arandom sample from a distribution having a pdf of 
_~ 
fO) = “re ag ifx >0 
0, otherwise. 


Find a uniformly most powerful test for testing Ho : 7 = no versus Hy: < no. 


7.2.6. Let X bea single observation from the pdf 


6x9-1 O<x<1 
fx) = 


0, otherwise. 


Find the most powerful test with a level of significance a = 0.01 to test Hp : 9 = 3 versus 
Ay: 0=4. 


7.2.7. Let X1,..., X, bearandom sample from a Bernoulli distribution with parameter p. Find the 
most powerful test of Hp : p = po versus Hg : p = Pa, where pa > Po- 


7.2.8. Let X1,...,X, be a random sample from a Poisson distribution with mean i. Find a best 
critical region for testing Ho : A = 3 against H, : A= 6. 


7.3 LIKELIHOOD RATIO TESTS 


The Neyman-Pearson lemma provides a method for constructing most powerful tests for simple 
hypotheses. We also have seen that in some instances when a hypothesis is not simple, it is pos- 
sible to find uniformly most powerful tests. In general, uniformly most powerful (UMP) tests do 
not exist for composite hypotheses. As an example, consider the two-sided hypothesis, at level a, 
given by 
Ho: #=o Vs. Ha: H# LO 

where ju is the mean of a normal population with known variance o?. If X is the sample mean of a 
random sample of size n, then as shown earlier, we can use the test statistic 
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_ X—Ho 
2 


For Hy : & = [41 > Lo, the rejection region for the most powerful test would be 


Reject Ho if z > Za. 
On the other hand for Hy : « = 2 < lo, the rejection region for the most powerful test would be 
Reject Ho if z < —Zy. 


Thus, the rejection region depends on the specific alternative. Consequently, the two-sided hypothesis 
just given has no UMP test. 


In this section, we shall study a general procedure that is applicable when one or both Hp and Hy, are 
composite. In fact, this procedure works for simple hypotheses as well. This method is based on the 
maximum likelihood estimation and the ratio of likelihood functions used in the Neyman-Pearson 
lemma. We assume that the pdf or pmf of the random variable X is f(x, 0), where @ can be one or 
more unknown parameters. Let © represent the total parameter space that is the set of all possible 
values of the parameter 6 given by either Ho or Ay. 


Consider the hypotheses 
Ho 0 € Qo vs. Ha :0€ @g =O-@. 


where @ is the unknown population parameter (or parameters) with values in ©, and @o is a subset 
of 0. 


Let L(@) be the likelihood function based on the sample Xj, ..., X;,. Now we define the likelihood 
ratio corresponding to the hypotheses Ho and Hg. This ratio will be used as a test statistic for the 
testing procedure that we develop in this section. This is a natural generalization of the ratio test used 
in the Neyman-Pearson lemma when both hypotheses were simple. 


Definition 7.3.1 The likelihood ratio A is the ratio 


max L(0;x1,...,% 
_ 9€@o ( . a _ Lo 
— max L(6;x1,...,Xn) — ie 
dcO 


We note that 0 < 4 < 1. Because d is the ratio of nonnegative functions, A > 0. Because Og is a subset 
of ©, we know that max L(6) < max L(@). Hence, A < 1. 
0€0o deO 


If the maximum of L in © is much smaller as compared with the maximum of L in @, that is, if 
A is small, it would appear that the data X;,..., X, do not support the null hypothesis 6 € ©. On 
the other hand, if 4 is close to 1, one could conclude that the data support the null hypothesis, Ho. 
Therefore, small values of 4 would result in rejection of the null hypothesis, and large values nearer 
to 1 will result a decision in support of the null hypothesis. 
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For the evaluation of i, it is important to note that maxg<@ L(0) = LO@nt.), where 6n/, is the maximum 
likelihood estimator of 6 € ©, and maxg¢ @, L(6) is the likelihood function with unknown parameters 
replaced by their maximum likelihood estimators subject to the condition that 9 € ©o. We can 
summarize the likelihood ratio test as follows. 


LIKELIHOOD RATIO TESTS (LRTs) 

To test 

Ho: 0 € @o vs.Hq: 0 € Og 
max L(0;x1,...,X, 

8EOo ( 1 n) 7 ue 


~ maxL(O;x7,...,Xn) Lx 
dEO 


will be used as the test statistic. 
The rejection region for the likelihood ratio test is given by 


Reject Ho if A < K. 


K is selected such that the test has the given significance level a. 


—_—— Orr 


Example 7.3.1 
Let X1,..., Xn be a random sample from an N(u, oa”). Assume that o2 is known. We wish to test, at level 
a, Ho: w= wo VS. Hg : wu # Mo. Find an appropriate likelihood ratio test. 


Solution 
We have seen that to test 


Ap: h@=pHo vs. Ha: hm # LO 


there is no uniformly most powerful test for this case. The likelihood function is 


n 
> @ - 4)? 
: i=1 
1 ~ 2 
L(u) = (==) @ 
210 
Here, Og = {uo} and Og = R — {uo}. 
Hence, 7 
> @ - 4)? 
' _i=1 
Lt = ( 1 ) 202 
0 u=ho \ Jno 
if 2 
~~ Gi — Ho) 
i=1 
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Similarly, 


1 ae) 
L* = max ( ) e 20 
—0O<[L<00 210 


Because the only unknown parameter in the parameter space © is jt, —co < ft < 00, the maximum of the 
likelihood function is achieved when jt equals its maximum likelihood estimator, that is, 


unt. =X. 
Therefore, with a simple calculation we have 
-( x (x H0)?) /20? 
e i=1 


eo = eo N—Ho)*/207 


- ( > (i-7) /202 


e 


Thus, the likelihood ratio test has the rejection region 
Reject Ho if A< K 
which is equivalent to 


= igy Sk S 


X — Mo 


o/./n 


>2InK =c1, say. 


Note that we use the symbol < to mean “if and only if.” We now compute cy. Under Ho, [((X — uo)/ 


(a/,/n)| ~ N(, 1). 
Observe that 


gives a possible value of cy as cy = Za/2- Hence, LRT for the given hypothesis is 


X — bo 


o/./n 


Thus, in this case, the likelihood ratio test is equivalent to the z-test for large random samples. 


Reject Ho if 


2 Za/2: 


In fact, when both the hypotheses are simple, the likelihood ratio test is identical to the Neyman- 
Pearson test. We can now summarize the procedure for the likelihood ratio test, LRT. 
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PROCEDURE FOR THE LIKELIHOOD RATIO TEST (LRT) 
1. Find the largest value of the likelihood L(6) for any 09 € @o by finding the maximum likelihood 
estimate within ©g and substituting back into the likelihood function. 
2. Find the largest value of the likelihood L() for any 6 € © by finding the maximum likelihood 
estimate within © and substituting back into the likelihood function. 
3. Form the ratio 


L(0) in @g 


N= MOC Onocon =e 
(x1,X2 Xn) L@) in © 


4. Determine a K so that the test has the desired probability of type | error, a. 
5. Reject Ho ifA < K. 


In the next example, we find a LRT for a testing problem when both Ho and H, are simple. 


-——“—_  .:.:.:.:.  nwmvnaQaaaqxw&w°_, , a _-- :_ 
Example 7.3.2 
Machine | produces 5% defectives. Machine 2 produces 10% defectives. Ten items produced by each of 
the machines are sampled randomly; X = number of defectives. Let 6 be the true proportion of defectives. 
Test Hp : 6 = 0.05 versus Hg : 6 = 0.1. Use a = 0.05. 


Solution 
We need to test Hp : 0 = 0.05 vs. Hy: 0 = 0.1. Let 


10 
( )(0.05)¥10.95)"°— if 6 = 0.05 
x 


LO) = 
(?)o.%0.90)', if @ = 0.10. 
x 
And 
@ x 10—x 
L, = L(0.05) = : (0.05)*(0.95) 
and 


Lo = L(0.1) = (*) (0.1)*(0.90)!9-*, 


Thus, we have 


Ey _ 005" 95)" _f1\* fis\ 
Ly 0.1% (0.9)10-x ~ \2 18 ; 
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The ratio 


Ly 
max(L1, L2)° 


Note that if max(L1, L2) = Ly, then d = 1. Because we want to reject for small values of A, max(L1, L2) = 
Ly, and we reject Ho if (Ly/L2) < K or (L2/L1) > K (note that a = 2*(78)10-*), 
That is, reject Ho if 


Hence, reject Ho if X > C; P(X > C|Hp : 8 = 0.05) < 0.05. 
Using the binomial tables, we have 

P(X > 2|9 = 0.05) = 0.0116 
and 

P(X > 2|6 = 0.05) = 0.0862. 


Reject Ho if X > 2. If we want a to be exactly 0.05, we have to use randomized test. Reject with 


probability $0385 = 0.5039 if X = 2. 


The likelihood ratio tests do not always produce a test statistic with a known probability distribu- 
tion such as the z-statistic of Example 7.3.1. If we have a large sample size, then we can obtain an 
approximation to the distribution of the statistic 4, which is beyond the level of this book. 


EXERCISES 7.3 


7.3.1. Let X;,..., X, be arandom sample from an N(j1, 07). Assume that o? is unknown. We wish 
to test, at level a, Ho : 4 = Uo VS Hg: & < Mo. Find an appropriate likelihood ratio test. 


7.3.2. Let X;,...,X, be a random sample from an N(w, 07). Assume that both yw and o? 


are 
unknown. We wish to test, at level a, Ho : o* = 04 vs. Ha : o* > of. Find an appropriate 


likelihood ratio test. 


7.3.3. Let X,,...,X, bearandom sample from an N(1, 07) and let Y;, Y2,..., Y, be an indepen- 
dent sample from an N(j12, 07), where o? is unknown. We wish to test, at level a, Ho : “1 = 
fia vs. Hg : 4) # 2. Find an appropriate likelihood ratio test. 


7.3.4. Let X1,..., Xn bea sample from a Poisson distribution with parameter A. Show that a like- 
lihood ratio test of Hp : A = Ao vs. Ha : 4 # Ao rejects the null hypothesis if X > mj, or 
x <m). 
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7.3.5. Let X1,...,X» be asample from an exponential distribution with parameter 6. Show that a 
likelihood ratio test of Ho : 6 = 0 vs. Ha : 8 # Op rejects the null hypothesis if )°"_, X; => m1 
or yy, Xi < mp. 


7.3.6. A clinical oncology program developed a set of guidelines for their cancer patients to follow. 
It is believed that the proportion of patients who are still living after 24 months is greater 
for those who follow the guidelines. Of the 40 patients who followed the guidelines, 30 are 
still living after 24 months, whereas of 32 patients who did not follow the guidelines, 21 are 
living after 24 months. Find a likelihood ratio test at aw = 0.01 to decide whether the program 
is effective. 


7.4 HYPOTHESES FOR A SINGLE PARAMETER 


In this section, we first introduce the concept of p-value. After that, we study hypothesis testing 
concerning a single parameter. 


7.4.1 The p-Value 


In hypothesis testing, the choice of the value of a is somewhat arbitrary. For the same data, if the test 
is based on two different values of a, the conclusions could be different. Many statisticians prefer to 
compute the so-called p-value, which is calculated based on the observed test statistic. For computing 
the p-value, it is not necessary to specify a value of a. We can use the given data to obtain the 
p-value. 


Definition 7.4.1 Corresponding to an observed value of a test statistic, the p-value (or attained 
significance level) is the lowest level of significance at which the null hypothesis would have been 
rejected. 


For example, if we are testing a given hypothesis with w = 0.05 and we make a decision to reject Ho 
and we proceeded to calculate the p-value equal to 0.03, this means that we could have used an a as 
low as 0.03 and still maintain the same decision, rejecting Ho. 


Based on the alternative hypothesis, one can use the following steps to compute the p-value. 


STEPS TO FIND THE p-VALUE 
1. Let TS be the test statistic. 
2. Compute the value of TS using the sample X1,...,Xn. Say it is a. 
3. The p-value is given by 
P(TS <alHo), if lower tail test 
p-value = 4 P(TS > alHo), if upper tail test 


P(|TS| > |a||Ho), if two tail test. 
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— Wo  M??:?.:0 == OO 
Example 7.4.1 
To test Ho : u = Ovs. Hg : uw # 0, suppose that the test statistic Z results in a computed value of 1.58. 
Then, the p-value = P(|Z| > 1.58) = 2(0.0571) = 0.1142. That is, we must have a type | error of 0.1142 in 
order to reject Ho. Also, if Ha : wu > O, then the p-value would be P(Z > 1.58) = 0.0582. In this case we 
must have an a of 0.0582 in order to reject Ho. 
= 


The p-value can be thought of as a measure of support for the null hypothesis: The lower its value, 
the lower the support. Typically one decides that the support for Ho is insufficient when the p-value 
drops below a particular threshold, which is the significance level of the test. 


REPORTING TEST RESULT AS p-VALUES 
1. Choose the maximum value of w that you are willing to tolerate. 
2. If the p-value of the test is less than the maximum value of @, reject Ho. 


If the exact p-value cannot be found, one can give an interval in which the p-value can lie. For example, 
if the test is significant at a = 0.05 but not significant for a = 0.025, report that 0.025 < p-value < 
0.05. So for a > 0.05, reject Ho, and for a < 0.025, do not reject Ho. 


In another interpretation, 1— (p-value) is considered as an index of the strength of the evidence against 
the null hypothesis provided by the data. It is clear that the value of this index lies in the interval 
[0, 1]. Ifthe p-value is 0.02, the value of index is 0.98, supporting the rejection of the null hypothesis. 
Not only do p-values provide us with a yes or no answer, they provide a sense of the strength of the 
evidence against the null hypothesis. The lower the p-value, the stronger the evidence. Thus, in any 
test, reporting the p-value of the test is a good practice. 


Because most of the outputs from statistical software used for hypothesis testing include the p-value, 
the p-value approach to hypothesis testing is becoming more and more popular. In this approach, 
the decision of the test is made in the following way. If the value of a is given, and if the p-value of the 
test is less than the value of a, we will reject Ho. If the value of a is not given and the p-value associated 
with the test is small (usually set at p-value < 0.05), there is evidence to reject the null hypothesis in 
favor of the alternative. In other words, there is evidence that the value of the true parameter (such as 
the population mean) is significantly different (greater, or lesser) than the hypothesized value. If the 
p-value associated with the test is not small (p > 0.05), we conclude that there is not enough evidence 
to reject the null hypothesis. In most of the examples in this chapter, we give both the rejection region 
and p-value approaches. 


—aaeGO_GQGQqQQuQunanagg Ems 
Example 7.4.2 
The management of a local health club claims that its members lose on the average 15 pounds or more 
within the first 3 months after joining the club. To check this claim, a consumer agency took a random 
sample of 45 members of this health club and found that they lost an average of 13.8 pounds within the 
first 3 months of membership, with a sample standard deviation of 4.2 pounds. 


7.4 Hypotheses for a Single Parameter 363 


(a) Find the p-value for this test. 
(b) Based on the p-value in (a), would you reject the null hypothesis at ~ = 0.01? 
Solution 


(a) Let w be the true mean weight loss in pounds within the first 3 months of membership in this club. 
Then we have to test the hypothesis 


Ap: w= 15 versus Hg: w < 15 


Here n = 45, X = 13.8, and s = 4.2. Because n = 45 > 30, we can use normal approximation. 
Hence, the test statistic is 


13.8—15 saree 
Z= — — = -1l. 
4.2//45 


and 
p-value = P(Z < —1.9166) ~ P(Z < —1.92) = 0.0274. 
Thus, we can use an a as small as 0.0274 and still reject Ho. 


(b) No. Because the p-value = 0.0274 is greater than a = 0.01, one cannot reject Hp. 
= 


In any hypothesis testing, after an experimenter determines the objective of an experiment and decides 
on the type of data to be collected, we recommend the following step-by-step procedure for hypothesis 
testing. 


STEPS IN ANY HYPOTHESIS TESTING PROBLEM 
1. State the alternative hypothesis, Hg (what is believed to be true). 

. State the null hypothesis, Ho (what is doubted to be true). 

. Decide on a level of significance a. 

. Choose an appropriate TS and compute the observed test statistic. 

. Using the distribution of TS and a, determine the rejection region(s) (RR). 

. Conclusion: If the observed test statistic falls in the RR, reject Hp and conclude that based on the 
sample information, we are (1 — a)100% confident that Hg is true. Otherwise, conclude that there is 
not sufficient evidence to reject Ho. In all the applied problems, interpret the meaning of your 
decision. 

7. State any assumptions you made in testing the given hypothesis. 

8. Compute the p-value from the null distribution of the test statistic and interpret it. 


au hWN 


7.4.2 Hypothesis Testing for a Single Parameter 


Now we study the testing of a hypothesis concerning a single parameter, 0, based on arandom sample 
X1,..., Xn. Let 6 be the sample statistic. First, we deal with tests for the population mean p for large 
and small samples. Next, we study procedures for testing the population variance o7. We conclude 
the section by studying a test procedure for the true proportion p. 
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To test the hypothesis H : 4= 0 concerning the true population mean jz, when we have a large 
sample (n > 30) we use the test statistic Z given by 


_ X-wo 
~ S/n 


where S is the sample standard deviation and jo is the claimed mean under Hp (if the population 
variance is known, we replace S with o. 


For a small random sample (n < 30), the test statistic is 
xX — 
T= LO 
S//n 
where jlo is the claimed value of the true mean, and X and S are the sample mean and standard 


deviation, respectively. Note that we are using the lowercase letters, such as z and f, to represent the 
observed values of the test statistics Z and 7, respectively. 


In practice, with raw data, it is important to verify the assumptions. For example, in the small sample 
case, it is important to check for normality by using normal plots. If this assumption is not satisfied, 
the nonparametric methods described in Chapter 12 may be more appropriate. In addition, because 
the sample statistic such as X and S will be greatly affected by the presence of outliers, drawing a box 
plot to check for outliers is a basic practice we should incorporate in our analysis. 


We now summarize the typical test of hypothesis for tests concerning population (true) mean. 


In order to compute the observed test statistic, z in the large sample case and ¢ in the small sample 
case, calculate the values of z = (¥ — uo) /(s/./n) and t = [(X — uo0)/(s/./n)], respectively. 


SUMMARY OF HYPOTHESIS TESTS FOR jw 


Large Sample (n > 30) 
To test 
Ho: “= Lo 
versus 
jL > [o, upper tail test 
Hg: & <Ho lower tail test 
LL # Lo, two-tailed test 


> _ X—Ho 
Test statistic: Z = 
o/J/n 
Replace oa by S, if o is unknown. 
Z>Zo upper tail RR 
Rejection region :} z < —Zy, _ lower tail RR 


[Z| > Za/2, two tail RR 


Small Sample (n < 30) 
To test 
Ho: 4 = Lo 
versus 
[L > Lo, upper tail test 
Hq: L < Lo, lower tail test 
lu Lo, two-tailed test 


7 X—HOo 
Test statistic: T = 
S//n 
t > tyn-1, upper tail RR 


RR:4t <—tyn—1, lower tail RR 


|t] > ta/2n-1, two tail RR 
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Assumption: n > 30 Assumption: Random sample 
comes from a normal 
population 


Decision: Reject Ho, if the observed test statistic falls in the RR and conclude that Hg is true with 
(1 — w)100% confidence. Otherwise, keep Ho so that there is not enough evidence to conclude that 
Hg is true for the given a and more experiments may be needed. 


———COC0V0_—0_OeeE=E=E=EeEeEeEeEe==eeEeeee————eee 
Example 7.4.3 
It is claimed that sports-car owners drive on the average 18,000 miles per year. A consumer firm believes that 
the average mileage is probably lower. To check, the consumer firm obtained information from 40 randomly 
selected sports-car owners that resulted in a sample mean of 17,463 miles with a sample standard deviation 
of 1348 miles. What can we conclude about this claim? Use w = 0.01. 


Solution 

Let yw be the true population mean. We can formulate the hypotheses as Ho : w = 18,000 versus 
Hg: w < 18,000. 

The observed test statistic (for n > 30) is 


_ ¥= Ho ~ 17,463 — 18,000 
o//n —-1348//40 


= —2.52. 


Rejection region is {z < —zo.o1} = {z < —2.33}. 
Decision: Because z = —2.52 is less than —2.33, the null hypothesis is rejected at wa = 0.01. There is 


sufficient evidence to conclude that the mean mileage on sport cars is less than 18,000 miles per year. 
a 


[—ee_—_—_—_—_——K—_—_—_— 
Example 7.4.4 
In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is thought that 
people travel on the average of at least 75 mph. To check this claim, the following radar measurements of 
the speeds (in mph) is obtained for 10 vehicles traveling on this stretch of the interstate highway. 


66 74 79 80 69 77 78 65 79 81 


Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this 
stretch of highway is at most 75 mph? Test the appropriate hypothesis using a = 0.01. Draw a box plot and 
normal plot for this data, and comment. 


Solution 
We need to test 


Hp: w=75 vs. Ha: > 75 
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65 


W@ FIGURE7.1 Box plot of speed data. 


For this sample, the sample mean is X = 74.8 mph and the standard deviation is o = 5.9963 mph. Hence, 
the observed test statistic is 


,_ Ero _ 748-75 
a//n — 5.9963/x/10 


= —0.10547. 


From the t-table, to.919 = 2.821. Hence, the rejection region is {t > 2.821}. 

Because, t = —0.10547 does not fall in the rejection region, we do not reject the null hypothesis at a = 0.01. 
Note that we assumed that the vehicles were randomly selected and that collected data follow the normal 
distribution, because of the small sample size, n < 30, we use the t-test. 

Figures 7.1 and 7.2 are the box plot and the normal plot of the data, respectively. 


ML Estimates 
Mean: 74.8 
Std Dev: 5.68858 


Percent 
ol 
=} 


W@ FIGURE7.2 Normal probability plot for speed. 


The box plot suggests that there are no outliers present. However, the normal plot indicates that the normality 
assumption for this data set is not justified. Hence, it may be more appropriate to do a nonparametric test. 
= 
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—!::_:2:°:°.:.:>.:°.:.:.:-__rQWKnr 
Example 7.4.5 
In attempting to control the strength of the wastes discharged into a nearby river, an industrial firm has 
taken a number of restorative measures. The firm believes that they have lowered the oxygen consuming 
power of their wastes from a previous mean of 450 manganate in parts per million. To test this belief, 
readings are taken on n = 20 successive days. A sample mean of 312.5 and the sample standard deviation 
106.23 are obtained. Assume that these 20 values can be treated as a random sample from a normal 
population. Test the appropriate hypothesis. Use a = 0.05. 


Solution 
Here we need to test the following hypothesis: 


Ho: w = 450 vs. Hg: w < 450 


Given n = 20, X = 312.5, and s = 106.23. The observed test statistic is 


pa 3125-450 ay 
106.23//20 = 


The rejection region for a = 0.05 and with 19 degrees of freedom is the set of t-values such that 
{t < —t0.95,19} = {t < —1.729}. 


Decision: Because t = —5.79 is less than —1.729, reject Hg. There is sufficient evidence to confirm the 
firm’s belief. 
For large random samples, the following procedure is used to perform tests of hypotheses about the 
population proportion, p. 

Be 


-.0.2.?°»0b60OQ°>°/C__r..:>:= 2.2.2. 
Example 7.4.6 
A machine is considered to be unsatisfactory if it produces more than 8% defectives. It is suspected that the 
machine is unsatisfactory. A random sample of 120 items produced by the machine contains 14 defectives. 
Does the sample evidence support the claim that the machine is unsatisfactory? Use a = 0.01. 


Solution 

Let Y be the number of observed defectives. This follows a binomial distribution. However, because npg and 
nqo are greater than 5, we can use a normal approximation to the binomial to test the hypothesis. So we 
need to test Ho : p = 0.08 versus Hg : p > 0.08. Let the point estimate of p be p = (Y/n) = 0.117, the 
sample proportion. Then the value of the TS is 


_ P—po _ 0.117 —0.08 


[pe 7 [oon ~ 
n 120 


For a = 0.01, zo.01 = 2.33. Hence, the rejection region is {z > 2.33}. 
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Decision: Because 0.137 is not greater than 2.33, we do not reject Ho. We conclude that the evidence does 
not support the claim that the machine is unsatisfactory. 


SUMMARY OF LARGE SAMPLE HYPOTHESIS TEST FOR p 
To test 


Ho :P = Po 
versus 


P> Po, upper tail test 


Ha:P <Po, lower tail test. 


zaP=PO where = [P0490 | where go = 1—Po. 
oD n 


Z>Zqa, upper tail RR 
Rejection region: + Zz <—Zy, lower tail RR 
[Z| > Zu/2, two tail RR, 


Test statistic: 


where z is the observed test statistic. 
Assumption: n is large. A good rule of thumb is to use the normal approximation to the binomial 
distribution only when npg and n(1 — po) are both greater than 5. 


Decision: Reject Ho, if the observed test statistic falls in the RR and conclude that Hg is true with 
(1 — a)100% confidence. Otherwise, do not reject Hp because there is not enough evidence to 
conclude that Hg is true for given w and more data are needed. 


Note that this an approximate test, and the test can be improved by increasing the sample size. 


Now we give the procedure for testing the population variance when the samples come from a normal 
population. 


SUMMARY OF HYPOTHESIS TEST FOR THE VARIANCE o2 
To test 


Ho Lor = a 
versus 
o* > 04, upper tail test 
Hg :0% < 0%, lower tail test 


o* £08, two-tailed test. 
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Test statistic: 


> (n—1)S2 
% 
where S2 is the sample variance. 
Observed value of test statistic: 
(n — 1)s2 
% 
x? > Keer upper tail RR 
Rejection region : tos ey ae lower tail RR 
a eo! = Aiea ene two fall BR 


where xe n_1 is such that the area under the chi-square distribution with (n — 1) degrees of freedom to its 
right is equal to w. 


Assumption: Sample comes from a normal population. 
Decision: Reject Ho, if the observed test statistic falls in the RR and conclude that Hg is true with 


(1 — a)100% confidence. Otherwise, do not reject Ho because there is not enough evidence to conclude 
that Hg is true for given a and more data are needed. 


Because the chi-square distribution is not symmetric, the “equal tails” used for the two-sided alter- 
native may not be the best procedure. However, in real-world problems we seldom use a two tail test 
for the population variance. 


———————ee——— 
Example 7.4.7 
A physician claims that the variance in cholesterol levels of adult men in a certain laboratory is at least 100. 
A random sample of 25 adult males from this laboratory produced a sample standard deviation of 
cholesterol levels as 12. Test the physician’s claim at 5% level of significance. 


Solution 
To test 


Ho: o* = 100 versus Ag: o* < 100 


for a = 0.05, and 24 degrees of freedom, the rejection region is 
RR= OC = Xj gp quik = OC < 15.484), 


The observed value of the TS is 
>  (™—1)S?_— (24)(144) 
x = = 


5 = 34.50. 
or 100 
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Because the value of the test statistic does not fall in the rejection region, we cannot reject Ho at 5% level 
of significance. Here, we assumed that the 25 cholesterol measurements follow the normal distribution. 
= 


EXERCISES 7.4 


7.4.1. A random sample of 50 measurements resulted in a sample mean of 62 with a sample 
standard deviation 8. It is claimed that the true population mean is at least 64. 
(a) Is there sufficient evidence to refute the claim at the 2% level of significance? 
(b) What is the p-value? 
(c) What is the smallest value of w for which the claim will be rejected? 


7.4.2. A machine in a certain factory must be repaired if it produces more than 12% defectives 
among the large lot of items it produces in a week. A random sample of 175 items from 
a week’s production contains 45 defectives, and it is decided that the machine must be 
repaired. 
(a) Does the sample evidence support this decision? Use a = 0.02. 
(b) Compute the p-value. 


7.4.3. Arandom sample of 78 observations produced the following sums: 
78 78 
yj 228, Ya 2.05. 
i=1 i=1 


(a) Test the null hypothesis that 4 = 0.45 against the alternative hypothesis that 4 < 0.45 
using a = 0.01. Also find the p-value. 

(b) Test the null hypothesis that 4 = 0.45 against the alternative hypothesis that uw 4 0.45 
using a@ = 0.01. Also find the p-value. 

(c) What assumptions did you make for solving (a) and (b)? 


7.4.4. Consider the test Ho : w = 35 vs. Hq : & > 35 for a population that is normally distributed. 

(a) Arandom sample of 18 observations taken from this population produced a sample 
mean of 40 and a sample standard deviation of 5. Using a = 0.025, would you reject 
the null hypothesis? 

(b) Another random sample of 18 observations produced a sample mean of 36.8 and 
a sample standard deviation of 6.9. Using a = 0.025, would you reject the null 
hypothesis? 

(c) Compare and discuss the decisions of parts (a) and (b). 


7.4.5. According to the information obtained from a large university, professors there earned an 
average annual salary of $55,648 in 1998. A recent random sample of 15 professors from 
this university showed that they earn an average annual salary of $58,800 with a sample 
standard deviation of $8300. Assume that the annual salaries of all the professors in this 
university are normally distributed. 
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(a) Suppose the probability of making a type I error is chosen to be zero. Without perform- 
ing all the steps of test of hypothesis, would you accept or reject the null hypothesis 
that the current mean annual salary of all professors at this university is $55,648? 

(b) Using the 1% significance level, can you conclude that the current mean annual salary 
of professors at this university is more than $55,648? 


7.4.6. Acheck-cashing service company found that approximately 7% of all checks submitted to the 
service were without sufficient funds. After instituting a random check verification system to 
reduce its losses, the service company found that only 70 were rejected in arandom sample of 
1125 that were cashed. Is there sufficient evidence that the check verification system reduced 
the proportion of bad checks at w = 0.01? What is the p-value associated with the test? What 
would you conclude at the a = 0.05 level? 


7.4.7. A manufacturer of washers provides a particular model in one of three colors, white, black, 
or ivory. Of the first 1500 washers sold, it is noticed that 550 were of ivory color. Would 
you conclude that customers have a preference for the ivory color? Justify your answer. Use 
a= 0.01. 


7.4.8. A test of the breaking strength of six ropes manufactured by a company showed a mean 
breaking strength of 6425 lb and a standard deviation of 120 lb. However, the manufacturer 
claimed a mean breaking strength of 7500 lb. 

(a) Can we support the manufacturer's claim at a level of significance of 0.10? 
(b) Compute the p-value. What assumptions did you make for this problem? 


7.4.9. A sample of 10 observations taken from a normally distributed population produced the 
following data: 


44 31 52 48 46 39 43 36 41 49 


(a) Test the hypothesis that Hp : uw = 44 vs. H, : uw # 44 using a = 0.10. Draw a box plot 
and normal plot for this data, and comment. 

(b) Find a 90% confidence interval for the population mean ju. 

(c) Discuss the meanings of (a) and (b). What can we conclude? 


7.4.10. The principal of a charter school in Tampa believes that the IQs of its students are above 
the national average of 100. From the past experience, IQ is normally distributed with a 
standard deviation of 10. A random sample of 20 students is selected from this school and 
their IQs are observed. The following are the observed values. 


95 91 10 93 133 119 113 107 110 89 
113. 100 100 124 116 113 110 106 115 113 


(a) Test for the normality of the data 
(b) Do the IQs of students at the school run above the national average at a = 0.01? 


7.4.11. In order to find out whether children with chronic diarrhea have the same average hemo- 
globin level (Hb) that is normally seen in healthy children in the same area, a random 
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sample of 10 children with chronic diarrhea are selected and their Hb levels (g/dL) are 
obtained as follows. 


12.3 11.4 14.2 15.3 14.8 13.8 11.1 15.1 15.8 13.2 


Do the data provide sufficient evidence to indicate that the mean Hb level for children with 
chronic diarrhea is less than that of the normal value of 14.6 g/dL? Test the appropriate 
hypothesis using a = 0.01. Draw a box plot and normal plot for this data, and comment. 


7.4.12. A company that manufactures precision special-alloy steel shafts claims that the variance in 
the diameters of shafts is no more than 0.0003. Arandom sample of 10 shafts gave a sample 
variance of 0.00027. At the 5% level of significance, test whether the company’s claim can 
be substantiated. 


7.4.13. It was claimed that the average annual expenditures per consumer unit had continued to 
rise, as measured by the Consumer Price Index annual averages (Bureau of Labor Statistics 
report, 1995). To test this claim, 100 consumer units were randomly selected in 1995 and 
found to have an average annual expenditure of $32,277 with a standard deviation of $1200. 
Assuming that the average annual expenditure of all consumer units was $30,692 in 1994, 
test at the 5% significance level whether the annual expenditure per consumer unit had 
really increased from 1994 to 1995. 


7.4.14. It is claimed that two of three Americans say that the chances of world peace are seriously 
threatened by the nuclear capabilities of other countries. If in a random sample of 400 
Americans, it is found that only 252 hold this view, do you think the claim is correct? Use 
a = 0.05. State any assumptions you make in solving this problem. 


7.4.15. According to the Bureau of Labor Statistics (1996), the average price of a gallon of gasoline 
in all U.S. cities in the United States in January 1996 was $1.129. A later random sample in 
24 cities found the mean price to be $1.24 with a standard deviation of 0.01. Test at a = 0.05 
to see whether the average price of a gallon of gas in the cities had recently changed. 


7.4.16. A manufacturer claims that the mean life of batteries manufactured by his company is at 
least 44 months. A random sample of 40 of these batteries was tested, resulting in a sample 
mean life of 41 months with a sample standard deviation of 16 months. Test at a = 0.01 
whether the manufacturer's claim is correct. 


7.5 TESTING OF HYPOTHESES FOR TWO SAMPLES 


In this section we study the hypothesis testing procedures for comparing the means and variances 
of two populations. For example, suppose that we want to determine whether a particular drug is 
effective for a certain illness. The sample subjects will be randomly selected from a large pool of 
people with that particular illness and will be assigned randomly to the two groups. To one group 
we will administer a placebo; to the other we will administer the drug of interest. After a period of 
time, we measure a physical characteristic, say the blood pressure, of each subject that is an indicator 
of the severity of the illness. The question is whether the drug can be considered effective on the 
population from which our samples have been selected. We will consider the cases of independent 
and dependent samples. 
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7.5.1 Independent Samples 


Two random samples are drawn independently of each other from two populations, and the sample 
information is obtained. We are interested in testing a hypothesis about the difference of the true 
means. Let X1,..., X1, be arandom sample from population 1 with mean j1; and variance or, and 
X21,..., X2n be arandom sample from population 2 with mean jz and variance o5. Let X;,i = 1, 2, 
represent the respective sample means and S?, i = 1, 2, represent the sample variances. In this case, 
we shall consider following three cases in testing hypotheses about jz; and j12: (i) when o7 and 04 
are known, (ii) when o7 and o% are unknown and n; > 30 and n > 30, and (iii) when o7 and o% are 
unknown and 7, < 30 and n < 30. In case (iii) we have the following two possibilities, (a) 07 = 03, 


and (b) of #03. 


In the large sample case, knowledge of population variances o? and o5 does not make much differ- 
ence. If the population variances are unknown, we could replace them with sample variances as an 
approximation. If both 7; > 30 and nz > 30 (large sample case), we can use normal approximation. 
The following box sums up a large sample hypothesis testing procedure for the difference of means 
for the large sample case. 


SUMMARY OF HYPOTHESIS TEST FOR j11 — 12 FOR LARGE SAMPLES (n,& nz > 30) 
To test 

Ho : 41 — #2 = Do 
versus 


[41 — 2 > Do, upper tailed test 
Ha: } 41 — U2 < Do, lower tailed test 
[41 — 2 #Do, two-tailed test. 


The test statistic is 


X,-X2-D 
ete AEA 
ot a3 
i 
Replace o; by S;, if oj,i = 1,2 are not known. 
Rejection region is 
Z>Zor upper tail RR 


RR:\z<-—Zqy,  lowertail RR 


Z| > Za/2, two tail RR, 


where z is the observed test statistic given by 


X1—-xX2-—D 

2 1 2 0 
2 2 

oa 2 
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Assumption: The samples are independent and nj and nz > 30. 
Decision: Reject Hp, if test statistic falls in the RR and conclude that Hg is true with (1 — a)100% confidence. 


Otherwise, do not reject Ho because there is not enough evidence to conclude that Hg is true for given a 
and more experiments are needed. 


Ess ———<—_—__———————————————————————————————<—— 
Example 7.5.1 
In a salary equity study of faculty at a certain university, sample salaries of 50 male assistant professors and 
50 female assistant professors yielded the following basic statistics. 


Sample mean | Sample standard 
salary deviation 
Male assistant professor $36,400 360 
Female assistant professor $34,200 220 


Test the hypothesis that the mean salary of male assistant professors is more than the mean salary of female 
assistant professors at this university. Use a = 0.05. 


Solution 
Let 11 be the true mean salary for male assistant professors and {42 be the true mean salary for female 
assistant professors at this university. To test 


Ao: 1 — 2 = OVS. Ha : ly — b2 > 0 


the test statistic is 


X1 —X2— Do 36,400 — 34,200 


= = = 36.872. 
| 3 (360)? (220)? 
+44 + 
ny ng 50 50 


The rejection region for a = 0.05 is {z > 1.645}. 
Because z = 36.872 > 1.645, we reject the null hypothesis at a = 0.05. We conclude that the salary of 
male assistant professors at this university is higher than that of female assistant professors for a = 0.05. 
Note that even though ae and o% are unknown, because ny > 30 and nz > 30, we could replace oF and 
oF by the respective sample variances. We are assuming that the salaries of male and female are sampled 
independently of each other. 

| 


Given next is the procedure we follow to compare the true means from two independent normal 
populations when n, and nz are small (11 <30 or nz < 30) and we can assume homogeneity in the 
population variances, that is, o7 = 03. In this case, we pool the sample variances to obtain a point 
estimate of the common variance. 
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COMPARISON OF TWO POPULATION MEANS, SMALL SAMPLE CASE (POOLED t-TEST) 
To test 


Ho: 41 — #2 = Do 
versus 


[41 — 2 > Do, upper tailed test 
Ha: 41 — 2 < Do, lower tailed test 
[1 — 2 #Do, two-tailed test. 


The test statistic is 


iq = Ma = 10) 
Te (| = = 


Here the pooled sample variance is 


2 _ (m — 1)SF + (nz — 1953 
P ny +nz—2 : 


Then the rejection region is 


if > lon upper tailed test 
RR: 4 t <—tg, lower tail test 


|t| > ty/2, two-tailed test 


where t is the observed test statistic and ty is based on (n; + nz — 2) degrees of freedom, and such that 
PGF Sie) = 0% 

Decision: Reject Ho, if test statistic falls in the RR and conclude that Hg is true with (1 — w)100% confidence. 
Otherwise, do not reject Ho because there is not enough evidence to conclude that Hg is true for given a. 


Assumptions: The samples are independent and come from normal populations with means j.1 and (12, 


and with the (unknown) but equal variances, that is, 07 = 03. 


Now we shall consider the case where of and 04 are unknown and cannot be assumed to be equal. 
In such a case the following test is often used. For the hypothesis 


H1— 12 > Do 
Ao : #1 — #2 = Do vs. Ho: } fh — 2 < Do 
Hi — 2 = Do 
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define the test statistic 7, as 


where 7, has a ¢-distribution with v degrees of freedom, and 


_ (6i/n + (s3/na)]° 
(st/m1)* — (s5/n2)? 


ny-l ny—1 


The value of v will not necessarily be an integer. In that case, we will round it down to the nearest 
integer. This method of hypothesis testing with unequal variances is called the Smith-Satterthwaite 
procedure. Even though this procedure is not widely used, some simulation studies have shown that 
the Smith-Satterthwaite procedure perform well when variances are unequal and it gives results that 
are more or less equivalent to those obtained with the pooled t-test when the variances are equal. 
However, when the sample sizes are approximately equal, the pooled t-test may still be used. Note 
that in addressing the question which of the cases (iii)(a) or (iii)(b) to use in a given problem, we 
suggest that if the point estimates S? of o7, and S} of 07 are approximately the same, then it is logical 
to assume homogeneity, 07 = o5 and use (iii)(a), whereas if S? and S% are significantly different we 
use (iii)(b). More appropriately, we have tests that can be used to test hypotheses concerning of = 04 
or 07 # 0%, known as the F-test, which we discuss at the end of this subsection. 


mG 
Example 7.5.2 
The intelligence quotients (IQs) of 17 students from one area of a city showed a sample mean of 106 with a 
sample standard deviation of 10, whereas the IQs of 14 students from another area chosen independently 
showed a sample mean of 109 with a sample standard deviation of 7. Is there a significant difference 
between the IQs of the two groups at w = 0.02? Assume that the population variances are equal. 


Solution 
We test 


Ao: #1 — #2 =O VS. Ha: 1 — 2 #0 


Here ny = 17, xX, = 106, and s; = 10. Also, ng = 14, X7 = 109, and sz = 7. 
We have 


2 ms 1st + (nz — 1s} 
p nytng—-—2 


_ (16)(10)? + (13)(7)? 
= 29 


= 77.138. 
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The test statistic is 


eee eee or 106 — 109 
= —_ = = -0.94644. 
ee (77.138) ay 
P ni n2 17 * 14 


For a= 0.02, to.01,29 = 2.462. Hence, the rejection region is t< — 2.462 or t > 2.462. 


Because the observed value of the test statistic, T = —0.94644, does not fall in the rejection region, there is 
not enough evidence to conclude that the mean IQs are different for the two groups. Here we assume that 
the two samples are independent and taken from normal populations. 

| 


Example 7.5.3 
Assume that two populations are normally distributed with unknown and unequal variances. Two inde- 


pendent samples were drawn from these populations and the data obtained resulted in the following basic 
statistics: 


ny =18 xX, =20.17 5s, =4.3 


nmg=12 x2=19.23 s9=3.8 


Test at the 5% significance level whether the two population means are different. 


Solution 
We need to test the hypothesis 


Ao: 41 — 2 = O versus Ha: “1 — 2 #0. 


Here ny = 18,X, = 20.17, and s, = 4.3. Also, ng = 12, ¥2 = 19.23, and s2 = 3.8. 
The degrees of freedom for the t-distribution are given by 


2 
: (sf /ny + 53/n2) 
(s¢/n1)?_— (s5/n2)* 
ny—1 ng—-—1 


(ae s G8 y 
T8 12 


7 (43°) (ee) 
18 12 
+ 


17 


= 25.685. 


Hence, we have v=25 degrees of freedom. For a= 0.05, to,025,25 = 2.060. Thus, the rejection region is 
t < —2.060 or t > 2.060. 
The test statistic is given by 
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20.17 — 19.23 
= a = 0.62939. 
4.3)? (3.8)? 
(4.3) + (3.8) 
18 12 


Because the observed value of the test statistic, T, = 0.62939, does not fall in the rejection region, we do not 
reject the null hypothesis. At a=0.05 there is not enough evidence to conclude that the population means 
are different. Note that the assumptions we made are that the samples are independent and came from two 


normal populations. No homogeneity assumption is made. 
[ise] 


Example 7.5.4 

Infrequent or suspended menstruation can be a symptom of serious metabolic disorders in women. In a 
study to compare the effect of jogging and running on the number of menses, two independent subgroups 
were chosen from a large group of women, who were similar in physical activity (aside from running), 
heights, occupations, distribution of ages, and type of birth control methods being used. The first group 
consisted of a random sample of 26 women joggers who jogged “slow and easy” 5 to 30 miles per week, 
and the second group consisted of a random sample of 26 women runners who ran more than 30 miles per 
week and combined long, slow distance with speed work. The following summary statistics were obtained 
(E. Dale, D. H. Gerlach, and A. L. Wilhite, “Menstrual Dysfunction in Distance Runners,” Obstet. Gynecol. 54, 


47-53, 1979). 
Joggers x, =10.1, 5; =2.1 
Runners X2=9.1, s2=2.4 


Using a= 0.05, (a) test for differences in mean number of menses for each group assuming equality of 
population variances, and (b) test for differences in mean number of menses for each group assuming 
inequality of population variances. 


Solution 
Here we need to test 


Ao: 1 — 2 = 0 versus Ha: “4 — 2 #0. 


Here, ny = 26, X1 = 10.1, and sj =2.1. Also, no = 26, X2 =9.1, and sz =2.4. 


(a) Under the assumption a: = a; we have 


gmt 1s} + (nz — Ds} 
pP ny tng —-—2 


_ (25)(2.1)? + (25)(2.4)? 


= 5.085. 
50 


The test statistic is 
X1—X2-D 
T 1 2 0 


/1 1 
Sp ng ae 
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For a= 0.05, t9.025,50 © 1.96. Hence, the rejection region is t < —1.96 and t > 1.96. Because 
T = 1.589 does not fall in the rejection region, we do not reject the null hypothesis. At a = 0.05 
there is not enough evidence to conclude that the population mean number of menses for joggers 
and runners are different. 

(b) Under the assumption oF x Or we have 


7 (sf /n1 + s3 Jn)? 


~ (st /ni)2_, (83/2)? 
ni—-1 n2—-1 


+ 


( (2.1)? , 2.4) y 


26 26 
= aa mr = 49.134, 
(Se ) (Se ) 
“26 “26 
m5 + 35 


Hence, we have v= 49 degrees of freedom. Because this value is large, the rejection region is still 
approximately t< — 1.96 and t> 1.96. Hence, the conclusion is the same as that of part (a). In 
both parts (a) and (b), we assumed that the samples are independent and came from two normal 
populations. 

= 


Now we present the summary of the test procedure for testing the difference of two proportions, 
inherent in two binomial populations. Here, again we assume that the binomial distribution is 
approximated by the normal distribution and thus it is an approximate test. 


SUMMARY OF HYPOTHESIS TEST FOR (p; —p2) FOR LARGE SAMPLES (n;p; >5 AND njq; > 5, 
FOR i = 1, 2) 
To test 
Ho : p1 — P2 = Do 
versus 
P1 —P2 < Do, upper tailed test 


Ha:P1—Pp2> Do, lower tailed test 
P1 —P2#Do, two-tailed test 


at significance level a, the test statistic is 


a eee 5 

7= al alee 0 
Pan P2492 
ny ar n2 


where z is the observed value of Z. 
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The rejection region is 


Z>Zq, Upper tailed RR 
RR:4 Zz <-—Zy, lower tailed RR 
[Z| > Zu/2, two-tailed RR 


Assumption: The samples are independent and 
njpj > 5and njq > 5, fori = 1,2. 


Decision: Reject Ho if the test statistic falls in the RR and conclude that Hg is true with (1 — a)100% 
confidence. Otherwise, do not reject Ho, because there is not enough evidence to conclude that Hg is true 
for given a and more experiments are needed. 


—_CIeaee Sooo 
Example 7.5.5 
Because of the impact of the global economy on a high-wage country such as the United States, it is claimed 
that the domestic content in manufacturing industries fell between 1977 and 1997. A survey of 36 randomly 
picked U.S. companies gave the proportion of domestic content total manufacturing in 1977 as 0.37 and in 
1997 as 0.36. At the 1% level of significance, test the claim that the domestic content really fell during the 
period 1977-1997. 


Solution 
Let p, be the domestic content in 1977 and pz be the domestic content in 1997. 
Given nj =nz = 36, py = 0.37 and pz = 0.36. We need to test 


Ho: pi — p2 =O vs. Ha: pi — p2 > 0. 


The test statistic is 


= P1 — P2 
Pig Pig 
ma + = 
0.37 — 0.36 


= = 0.08813. 
(0.37)(0.63) +4 (0.36)(0.64) 
36 36 


Fora = 0.01, zo0.01 = 2.325. Hence, the rejection region is z > 2.325. 
Because the observed value of the test statistic does not fall in the rejection region, at a = 0.01, there is not 
enough evidence to conclude that the domestic content in manufacturing industries fell between 1977 and 


1997, 
= 


Let X1,..., X, and Y;,..., Y, be two independent random samples from two normal populations 
with sample variances s7 and s3, respectively. The problem here is of testing for the equality of the 
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variances, Ho : of = 03. We have already seen in Chapter 4 that 
_ S? /o* 
S . / a 


follows the F-distribution with v; = n; — 1 numerator and v2 = n2 — 1 degrees of freedom. Under 
the assumption Ho : 07 = 0%, we have 

S} 
which has an F-distribution with (v;, v2) degrees of freedom. We summarize the test procedure for 


the equality of variances. 


TESTING FOR THE EQUALITY OF VARIANCES 
To test 


Ho 104 = 0% 


versus 


0+ > 03, lower tailed test 


Ha: 0% <o%, upper tailed test 


a+ #05, two-tailed test 


at significance level a, the test statistic is 


st 
$5 
The rejection region is 
f > Fy(v4,12), upper tailed RR 
RR: f < Fy_9(v1,12), lower tailed RR 


f > Fy/2(v1,v2) or f < Fy_q/2(v1,v2), two-tailed RR 


2 
where f is the observed test statistic given by f = +. 
2 


Decision: Reject Ho if the test statistic falls in the RR and conclude that Hg is true with (1 — a)100% 
confidence. Otherwise, keep Ho, because there is not enough evidence to conclude that Hg is true for 
a given w and more experiments are needed. 


Assumption: 
(i) The two random samples are independent. 
(ii) Both populations are normal. 


Recall from Section 4.2 that in order to find F\_~9(v, v2), we use the identity Fy_g¢(v1, v2) = 
(1/Fo(v2, V1)). 
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| 
Example 7.5.6 
Consider two independent random samples X1,..., Xn from an N({11, of) distribution and ¥j,..., Yn 
from an N(j12, 05) distribution. Test Ho : 0? = 05 versus Hq : 07 # 05 for the following basic statistics: 


ny = 25,%] = 410, st = 95, andnz = 16, % = 390, s5 = 300 


Use a = 0.20. 


Solution 

Test Ho: a = Oy versus Hg : o? # a: This is a two-tailed test. 

Here the degrees of freedom are vy = 24 and v2 = 15. The test statistic is 
2 


F=5 = — =0.317. 
300 


From the F-table, Fo.190(24, 15) =1.90 and Fo.99(24, 15) =(1/Fo.10(15, 24)) = 0.50. 


Hence, the rejection region is F > 1.90 or F < 0.56. Because the observed value of the test statistic, 0.317, 
is less than 0.56, we reject the null hypothesis. There is evidence that the population variances are not equal. 
lize 


7.5.2 Dependent Samples 


We now consider the case where the two random samples are not independent. When two samples 
are dependent (the samples are dependent if one sample is related to the other), then each data 
point in one sample can be coupled in some natural, nonrandom fashion with each data point in 
the second sample. This situation occurs when each individual data point within a sample is paired 
(matched) to an individual data point in the second sample. The pairing may be the result of the 
individual observations in the two samples: (1) representing before and after a program (such as 
weight before and after following a certain diet program), (2) sharing the same characteristic, (3) 
being matched by location, (4) being matched by time, (5) control and experimental, and so forth. 
Let (X1;, X2;), fori = 1, 2,...,n, be arandom sample. Xj;, and X2; (i € j) are independent. To test 
the significance of the difference between two population means when the samples are dependent, 


we first calculate for each pair of scores the difference, Dj = X1; — X2;,i = 1, 2,...,n, between the 
two scores. Let Wp = E(D;). Because pairs of observations form a random sample D),..., Dy, are 
independent and identically distributed random variables, if d,,...,d, are the observed values of 
D,,..., Dn, then we define 
n Lyn 2 
n n a - (ea) 
a 2 2 _ l= i= 
oe and a= 57 d) er 
i i 


Now the testing for these n observed differences will proceed as in the case of a single sample. If the 
number of differences is large (n > 30), large sample inferential methods for one sample case can 
be used for the paired differences. We now summarize the hypothesis testing procedure for small 
samples. 
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SUMMARY OF TESTING FOR MATCHED PAIRS EXPERIMENT 

To test 

kp > do, upper tail test 
Lp < do, lower tail test 

Lp #49, two-tailed test 


Ho : Lp = do versus Hg : 


D—Do 
Sp//n 


the test statistic: T = 


freedom). 
The rejection region is 


(this approximately follows a Student t-distribution with (n — 1) degrees of 


t>ten—1, upper tail RR 
t <—tyn—1, lower tail RR 
|t| > ta/2,n-1, two-tailed RR 


where t is the observed test statistic. 

Assumptions: The differences are approximately normally distributed. 

Decision: Reject Ho if the test statistic falls in the RR and conclude that Hg is true with (1 — a)100% 
confidence. Otherwise, do not reject Ho, because there is not enough evidence to conclude that Hg is true 
for a given a and more data are needed. 


Orr 


Example 7.5.7 

A new diet and exercise program has been advertised as remarkable way to reduce blood glucose levels in 
diabetic patients. Ten randomly selected diabetic patients are put on the program, and the results after 1 
month are given by the following table: 


Before | 268 | 225 | 252 | 192 | 307 | 228 | 246 | 298 | 231 | 185 
After | 106 | 186 | 223 | 110 | 203 | 101 | 211 | 176 | 194 | 203 


Do the data provide sufficient evidence to support the claim that the new program reduces blood glucose 
level in diabetic patients? Use a = 0.05. 


Solution 
We need to test the hypothesis 


Hop: up=0 vs. Ha: up <0. 


First we calculate the difference of each pair given in the following table. 


Before 268} 225} 252} 192); 307] 228) 246) 298] 231|185 
After 106} 186} 223} 110; 203} 101) 211) 176] 194} 203 
Difference 162|—39| —29| —82|—104|—127) —35)—122|—37| 18 
(after—before) 
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From the table, the mean of the differences is d = —71.9 and the standard deviation sq = 56.2. 

The test statistic is 

pdm = 
sal/Vn— 56.2/./10 


From the t-table, to,95,9 = 1.833. Because the observed value of t= — 4.05 < —to,95,9 = —1.833, we reject 
the null hypothesis and conclude that the sample evidence suggests that the new diet and exercise program 


4.0457 © —4.05. 


is effective. 
|| 


We can also obtain a (1 — a) 100% confidence interval for wp using the formula 


- Sa > Sd 
D — te/2 =, D+ ta2 = 
( a/2 Va a a/2 =i) 
where fy/2 is obtained from the t-table with (n — 1) degrees of freedom. The interpretation of the 
confidence interval is identical to the earlier interpretation. 


__DR_L>]YN>pNvNr \ADAARahRh2o392_— A. 


Example 7.5.8 
For the data in Example 7.5.7, obtain a 95% confidence interval for jz p and interpret its meaning. 


Solution 
We have already calculated d = — 71.9 and sq =56.2. From the t-table, to.925,.9 = 2.262. Hence, a 95% 
confidence interval for wp is (—112.1, —31.7). That is, P(—112.1 < wp < —31.7) = 0.95. Note that 
LD = [1 — 2, and from the confidence limits we can conclude with 95% confidence that 2 is always 
greater than 14, that is, w2 > 4. 

| 


It is interesting to compare the matched pairs test with the corresponding two independent sample 
test. One of the natural questions is, why must we take paired differences and then calculate the mean 
and standard deviation for the differences—why can’t we just take the difference of means of each 
sample, as we did for independent samples? The answer lies in the fact that oF need not be equal to 


ae at Assume that 
E(X ji) = uj, Var(Xj) = 07, for j = 1,2, 
and 
Cov(X1;, X21) = po1o2 
where p denotes the assumed common correlation coefficient of the pair (Xj;, X2;) fori = 1, 2,...,n. 
Because the values of D;,i = 1, 2,...,n, are independent and identically distributed, 


Mp = E(Dj) = E(X1i) — E(X2i) = 1 — b2 
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and 


on = Var(Dj) = Var(X4;) + Var(X2;) — 2Cov(X4;, X2)) 


= of + o3 — 2/0}02. 
From these calculations, 
E(D) = up = 1 — b2 


and 


2 
= o 1 
oz = Var(D) = PD — -(o7 + o3 — 2002). 
n n 
Now, if the samples were independent with nj =n2 =n, 


E(X, —X2) = m1 - 2 


and 
- oa Lee 4g 
isk ge ee 
Hence, if p > 0, then o2 <o% —.. Asa result, we can see that the matched pairs test reduces any 


D (X1—-X2) 
variability introduced by differences in physical factors in comparison to the independent samples 


test when p > 0. It is also important to observe that normality assumption for the difference does not 
imply that the individual samples themselves are normal. Also, in a matched pairs experiment, there 
is no need to assume the equality of variances for the two populations. Matching also reduces degrees 
of freedom, because in case of two independent samples, the degrees of freedom is (n} + n2 — 2), 
whereas for the case of two dependent samples it is only (” — 1). 


EXERCISES 7.5 


7.5.1. Two sets of elementary school children were taught to read by different methods, 50 by each 
method. At the conclusion of the instructional period, a reading test gave results yj = 74, 
¥z = 71, s; = 9, and s7 = 10. What is the attained significance level if you wish to see if 
there is evidence of a real difference between the two population means? What would you 
conclude if you desired an a-value of 0.05? 


7.5.2. The following information was obtained from two independent samples selected from two 
normally distributed populations with unknown but equal variances. 


Sample 1/14] 15 | 11|14| 10) 8] 13 | 10/12] 16] 15 
Sample 2] 17] 16 | 21) 12] 20) 18] 16 | 14 | 21 | 20 | 13 | 20 | 13 


Test at the 2% significance level whether jz; is lower than p12. 
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7.5.3. Inthe academic year 1997-1998, two random samples of 25 male professors and 23 female 
professors from a large university produced a mean salary for male professors of $58,550 
with a standard deviation of $4000 and an average for female professors of $53,700 with a 
standard deviation of $3200. At the 5% significance level, can you conclude that the mean 
salary of all male professors for 1997-1998 was higher than that of all female professors? 
Assume that the salaries of male and female professors are both normally distributed with 
equal standard deviations. 

7.5.4. It is believed that the effects of smoking differ depending on race. The following table gives 
the results of a statistical study for this question. 

Number inthe Average number of Number of lung 
study cigarettes per day cancer cases 
Whites 400 15 78 
African 280 15 70 
Americans 
Do the data indicate that African Americans are more likely to develop lung cancer due to 
smoking? Use a = 0.05. 

7.5.5. A supermarket chain is considering two sources A and B for the purchase of 50-pound bags 

of onions. The following table gives the results of a study. 
Source A Source B 
Number of bags weighed 80 100 
Mean weight 105.9 100.5 
Sample variance 0.21 0.19 
Test at w = 0.05 whether there is a difference in the mean weights. 

7.5.6. In order to compare the mean Hemoglobin (Hb) levels of well-nourished and undernour- 
ished groups of children, random samples from each of these groups yielded the following 
summaty. 

Number of Sample Sample standard 
children mean deviation 
Well nourished 95 11.2 0.9 
Undernourished 75 9.8 1,2 
Test at a = 0.01 whether the mean Hb levels of well-nourished children were higher than 
those of undernourished children. 
7.5.7. An aquaculture farm takes water from a stream and returns it after it has circulated through 


the fish tanks. In order to find out how much organic matter is left in the waste water after 
the circulation, some samples of the water are taken at the intake and other samples are 
taken at the downstream outlet and tested for biochemical oxygen demand (BOD). BOD is 
a common environmental measure of the quantity of oxygen consumed by microorganisms 
during the decomposition of organic matter. If BOD increases, it can be said that the waste 
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matter contains more organic matter than the stream can handle. The following table gives 
data for this problem. 


Upstream 9.0} 6.8]6.5 | 8.0} 7.7) 8.6 | 6.8 | 8.9] 7.2 | 7.0 
Downstream | 10.2 | 10.2 | 9.9 | 11.1 | 9.6 | 8.7 | 9.6 | 9.7 | 10.4 | 8.1 


Assuming that the samples come from a normal distribution, 

(a) Test that the mean BOD for the downstream samples is less than for the samples 
upstream at a = 0.05. Assume that the variances are equal. 

(b) Test for the equality of the variances at a = 0.05. 

(c) In parts (a) and (b), we assumed samples are independent. Now, we feel this assump- 
tion is not reasonable. Assuming that the difference of each pair is approximately 
normal, test that the mean BOD for the downstream samples is less than for the 
upstream samples at a = 0.05. 


7.5.8. Suppose we want to know the effect on driving of a drug for cold and allergy, in a study 
in which the same people were tested twice, once after 1 hour of taking the drug and once 
when no drug is taken. Suppose we obtain the following data, which represent the number 
of cones (placed in a certain pattern) knocked down by each of the nine individuals before 
taking the drug and after an hour of taking the drug. 


Nodrug |0/0/}3/2/0])/0/3)3)1 
Afterdrug|/1/5/6]/5]/5/5)6]1 


Assuming that the difference of each pair is coming from an approximately normal distribu- 
tion, test if there is any difference in the individuals’ driving ability under the two conditions. 
Use a = 0.05. 


7.5.9. Suppose that we want to evaluate the role of intravenous pulse cyclophosphamide (IVCP) 
infusion in the management of nephrotic syndrome in children with steroid resistance. 
Children were given a monthly infusion of IVCP in a dose of 500 to 750 mg/m?. The 
following data (source: S. Gulati and V. Kher, “Intravenous pulse cyclophosphamide—A new 
regime for steroid resistant focal segmental glomerulosclerosis,” Indian Pediatr. 37, 2000) 
represent levels of serum albumin (g/dL) before and after IVCP in 14 randomly selected 
children with nephrotic syndrome. 


Pre-IVCP |2.0|2.5|1.5)/2.0}2.3]2.1) 2.3) 1.0/2.2] 1.8)2.0)2.0/1.5|3.4 
Post-IVCP |3.5|4.3|4.0/4.0]3.8]2.4 | 3.5) 1.7/3.8]/3.6]3.8) 3.8/4.1 |3.4 


Assuming that the samples come from a normal distribution: 

(a) Test whether the mean Pre-IVCP is less than the mean Post-IVCP at a = 0.05. Assume 
that the variances are equal. 

(b) Test for the equality of the variances at a = 0.05. 

(c) In parts (a) and (b), we assumed that the samples are independent. Now, we feel 
this assumption is not reasonable. Assuming that the difference of each pair is 
approximately normal, test that the mean Pre-IVCP is less than the Post-IVCP at 
a = 0.05. 
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7.5.10. Show that S?, is an unbiased estimator of of. 


7.5.11. Test Ho : of = 0% versus Hy : of 4 0% for the following data. 


ny = 10,%7 = 71,5¢ =64 and nz = 25,%7 = 131,53 = 96. 


Use a = 0.10. 


7.5.12. The IQs of 17 students from one area of a city showed a mean of 106 with a standard 
deviation of 10, whereas the IQs of 14 students from another area showed a mean of 109 with 
a standard deviation of 7. Test for equality of variances between the IQs of the two groups at 
a = 0.02. 


7.5.13. The following data give SAT mean scores for math by state for 1989 and 1999 for 20 randomly 
selected states (source: The World Almanac and Book of Facts 2000). 


State 1989 1999 
Arizona 523 525 
Connecticut 498 509 
Alabama 539 555 
Indiana 487 498 
Kansas 561 576 
Oregon 509 525 
Nebraska 560 571 
New York 496 502 
Virginia 507 499 
Washington 515 526 
Illinois 539 585 
North Carolina 469 493 
Georgia 475 482 
Nevada 512 517 
Ohio 520 568 
New Hampshire 510 518 


Assuming that the samples come from a normal distribution: 
(a) Test that the mean SAT score for math in 1999 is greater than that in 1989 ata = 0.05. 
Assume the variances are equal. 
(b) Test for the equality of the variances at a = 0.05. 


7.6 CHI-SQUARE TESTS FOR COUNT DATA 


In this section, we study several commonly used tests for count data. These are basically large sample 
tests based on a x?-approximation. Suppose that we have outcomes of a multinomial experiment that 


consists of K mutually exclusive and exhaustive events A,,..., Ag. Let P(Aj) = pj, i = 1,2,...,k. 
Then )°7_, pi = 1. Let the experiment be repeated n times, and let X;(i = 1,2,...,k) represent 
the number of times the event A; occurs. Then (X;,..., X;) have a multinomial distribution with 


parameters n, pi,..., Dx. 
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Let 
A (Xi — mpi)? 
Qo? - L L : 

» (Xj — npi)* 
It can be shown that for large n, the random variable Q? is approximately x?-distributed with (k — 1) 
degrees of freedom. It is usual to demand np; > 5 (i = 1, 2,..., k) for the approximation to be valid, 
although the approximation generally works well if for only a few values of i (about 20%), np; > 1 
and the rest (about 80%) satisfy the condition np; > 5. This statistic was proposed by Karl Pearson 
in 1900. 


It should be noted that the x*-tests that we discuss in this section are approximate tests valid for 
large samples. Often X; is called the observed frequency and is denoted by O; (this is the observed 
value in class i), and np; is called the expected frequency and is denoted by £; (this is the theoretical 
distribution frequency under the null hypothesis). Thus, with these notations, we get 


gis 3 (0; = EA) 


Example 7.6.1 

A plant geneticist grows 200 progeny from a cross that is hypothesized to result in a 3:1 phenotypic 
ratio of red-flowered to white-flowered plants. Suppose the cross produces 170 red- to 30 white-flowered 
plants. Calculate the value of Q? for this experiment. 


Solution 

There are two categories of data totaling n = 200. Hence, k = 2. Let i = 1 represent red-flowered and i = 2 
represent white-flowered plants. Then O, = 170, and O2 = 30. 

Here, Hg : The flower color population ratio is not different from 3 : 1, and the alternate is Hq : The flower 
color population sampled has a flower color ratio that is not 3 red : 1 white. 

Under the null hypothesis, the expected frequencies are E, = (200)(3/4) = 150, and Ey = (200)(1/4) = 50. 
Hence, 


2 2 

O; -— Ej 
ota 
i=1 i 


_ (170 — 150)? ‘ (30 — 50)? 
~ 150 50 


= 10.667. 


The type of calculation in Example 7.6.1 gives a measure of how close our observed frequencies come 
to the expected frequencies and is referred to as a measure of goodness of fit. Smaller values of Q7 
values indicate better fit. 


One of the most frequent uses of the x?-test is in comparison of observed frequencies. Unless the 
sample size is exactly 100, percentages cannot be used. These are approximate tests. Let the random 
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variables (Xj, ..., X;) havea multinomial distribution with parametersn, p1,..., px. Letn be known. 
We will now present some important tests based on the chi-square statistic. 


7.6.1 Testing the Parameters of Multinomial Distribution: Goodness-of-Fit 


Test 
Let an experiment have k mutually exclusive and exhaustive outcomes Aj, Az,..., Ax. We would 
like to test the null hypothesis that all the p; = p(A;),i = 1,2,...,k are equal to known numbers 
Pio,i = 1,2,...,k. We now summarize the test procedure. 


TESTING THE PARAMETERS OF A MULTINOMIAL DISTRIBUTION (SUMMARY) 
To test 


Ho: P1 = P10:---»Pk = Pko 


versus 


Hq : At least one of the probabilities is different from the hypothesized value. 


The test is always a one-sided upper tail test. 
Let O; be the observed frequency, E; = npjo be the expected frequency (frequency under the null 
hypothesis), and k be the number of classes. The test statistic is 


The test statistic Q? has an approximate chi-square distribution with k — 1 degrees of freedom. 
The rejection region is 


2 2 
O72 Xak—1" 


Assumption: £; > 5: Exact methods are available. Computing the power of this test is difficult. 


This test is known as the goodness-of-fit test. It implies that if the observed data are very close to the 
expected data, we have a very good fit and we accept the null hypothesis. That is, for small Q? values, 
we accept Ho. 


———""""""oVCC——eae=eEeeeeeEeEeE=EeEeEeeeeees 
Example 7.6.2 
A TV station broadcasts a series of programs on the ill effects of smoking marijuana. After the series, the 
station wants to know whether people have changed their opinion about legalizing marijuana. Given in the 
following tables are the data based on a survey of 500 randomly chosen people: 
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Before the Series Was Shown 


For legalization | Decriminalization Existing law No opinion 
(fine or imprisonment) 
7% 18% 65% 10% 


After the Series Was Shown 


For legalization | Decriminalization Existing law No opinion 
(fine or imprisonment) 
39% 9% 36% 16% 


Here, n = 4, and we wish to test 
Ho : p) = 0.07; p2 = 0.18; p3 = 0.65; pg = 0.1 
versus 
Hq : At least one of the probabilities is different from the hypothesized value. 


The test is always an upper tail test. Test this hypothesis using a = 0.01. 


Solution 
We have 


FE, = (500)(0.07) = 35; Ey = 90; £3 = 325; Eq = 50. 


The observed frequencies are 
O, = (500)(0.39) = 195; O2 = 45; O3 = 180; O4 = 80. 


The test statistic is 


oO? 


4 
oS (O; — Ei)? 
, E; 

i=1 


195-35)? (45—90)2 (180 —325)2. (80 — 50) 
7 ( ) q$ ) as ) 4S ) 
35 90 325 50 


= 836.62. 


From the x2-table, X$.01,3 = 11.3449. Because the test statistic 0? = 836.62 > 11.3449, we reject Hg at 
a = 0.01. Hence, the data suggest that people have changed their opinion after the series on the ill effects 
of smoking marijuana was shown. 

z= 
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Example 7.6.3 
A die is rolled 60 times and the face values are recorded. The results are as follows. 


Up face 1 23 4 5 6 
Frequency 8 11 5 12 15 9 


Is the die balanced? Test using w = 0.05. 


Solution 
If the die is balanced, we must have 


1 
P1 = P2=.--=P6 6 
where p; = P(face value on the die is i), i = 1,2,...,6. This has the discrete uniform distribution. 
Hence, 
1 
i ie ae aa oe 
versus 


Hq : At least one of the probabilities is different from the hypothesized value of 1/6 
Ey HNP, = (60)(1/6) = 10, ary E6 = 10. 


We summarize the calculations in the following table: 


Face value 1 2} 3} 4) 5] 6 
Frequency, Oj 8} 11 5] 12]15} 9 
Expected value, E; | 10 | 10 | 10 | 10 | 10 | 10 


The test statistic value is given by 


6 2 
Oj — Ej 
g? =y ED A, 


i=1 : 
From the chi-square table with 5 d.f, Nee 5 = 11.070. 


Because the value of the test statistic does not fall in the rejection region, we do not reject Ho. Therefore, we 
conclude that the die is balanced. 
= 


7.6.2 Contingency Table: Test for Independence 


One of the uses of the x-statistic is in contingency (dependence) testing where n randomly selected 
items are classified according to two different criteria, such as when data are classified on the basis of 
two factors (row factor and column factor) where the row factor has r levels and the column factor 
has c levels. The obtained data are displayed as shown in the following table, where nj; represents 
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the number of data values under row i and column j. Our interest here is to test for independence of 
two methods of classification of observed events. For example, we might classify a sample of students 
by sex and by their grade on a statistics course in order to test the hypothesis that the grades are 
dependent on sex. More generally the problem is to investigate a dependency (or contingency) between 
two classification criteria. 


Levels of column factor 

1 2 sen. 40 Row total 
Row 1 ny n12 Nic ny 
levels 2 ny ny N2 n2 

r Nr Nr anc Ny 
Columntotal nj, no Ne N 
ic ’ ig ¢ 
where N = Yo nj = Yoni. = DS DY ni is the grand total. 
j=l i=l i=1 j=l 


We wish to test the hypothesis that the two factors are independent. We summarize the procedure 
in the following table for testing that the factors represented by the rows are independent with that 
represented by the columns. 


TESTING FOR THE INDEPENDENCE OF TWO FACTORS 
To test 
Ho : The factors are independent 


versus 
Hq : The factors are dependent 

the test statistic is, 

ao 


{yy SE 


(— j= 


where 
Suh 
and 
njnj 
=, 
yN 


Then under the null hypothesis the test statistic Q? has an approximate chi-square distribution with 
(r — 1)(c — 1) degrees of freedom. 
Hence, the rejection region is Q? > oe (r—-1(e-1)" 


Assumption: Fj > 5. 
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©, eee 
Example 7.6.4 


The following table gives a classification according to religious affiliation and marital status for 500 
randomly selected individuals. 


Religious affiliation 
A} B| C} D| None | Total 
Marital status Single | 39) 19| 12) 28} 18 116 
With spouse | 172 | 61 | 44 | 70 | 37 384 

Total | 211 | 80 | 56 | 98 55 500 


For w = 0.01, test the null hypothesis that marital status and religious affiliation are independent. 


Solution 
We need to test the hypothesis 


Ho : Marital status and religious affiliation are independent 
versus 
Hq: Marital status and religious affiliation are dependent. 


Here, c = 5, andr = 2. Fora = 0.01, and for (c — 1)(r — 1) = 4 degrees of freedom, we have 
2 = 
Maci.a = 13.2767 


Hence, the rejection region is Q? > 13.2767. 
iNj 


We have Ej; = “ . Thus, 


(116)(211) (116)(80) 
a. SAS See: 
> 500 i 500 
(116)(56) (116)(98) 
fig 22" = 19999, Bja = 2 2 9 796: 
bs 500 a 500 
(116)(55) (384)(211) 
fies ee Si 7 by SS Se 
i 500 a 500 
384)(80 384)(56 
E92 — Sen eO) = 61.44; £3 = eva) => 43.008; 
500 500 
and 
384)(98 384)(55 
Fo PIO pepe gee OP) oa, 
500 500 


The value of the test statistic is 


2_ < (Ojj—Eij)* 
Sa rea 


i=1 j=1 
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(39 — 48.952) (19 — 18.5)2, (12 — 12.992). (28 — 22.736)? 
48.952 18.5 12.992 22.736 


(18—12.76)2, (172 — 162.05). (61 — 61.44)2. (44 — 43.08)? 


12.76 162.05 61.44 43.08 
(70 — 75.264)? (37 — 42.24)? 
75.264 42.24 
= 7.1351. 


Because the observed value of Q? does not fall in the rejection region, we do not reject the null hypoth- 
esis at a = 0.01. Therefore, based on the observed data, the marital status and religious affiliation are 
independent. 

= 


7.6.3 Testing to Identify the Probability Distribution: Goodness-of-Fit 
Chi-Square Test 

Another application of the chi-square statistic is using it for goodness-of-fit tests in a different context. 

In hypothesis testing problems we often assume that the form of the population distribution is known. 

For example, in a x*-test for variance, we assume that the population is normal. The goodness-of-fit 

tests examine the validity of such an assumption if we have a large enough sample. We now describe 

the goodness-of-fit test procedure for such applications. 


GOODNESS-OF-FIT TEST PROCEDURES FOR PROBABILITY DISTRIBUTIONS 
Let X;,...,Xn be a sample from a population with cdf F(x), which may depend on the set of unknown 
parameters 6. We wish to test Ho : F(x) = Fo(x), where Fo(x) is completely specified. 
1. Divide the range of values of the random variables X; into K nonoverlapping intervals /1,/2,..., Ik. 
Let Oj be the number of sample values that fall in the interval /j(j = 1,2,...,K). 
2. Assuming the distribution of X to be Fo(x), find P(X € |j). Let P(X € Jj) = nj. Let ej = nzzj be the 
expected frequency. 
3. Compute the test statistic Q? given by 


K a 
(O; — Ej) 
Q? = Y  . 
j=l Ej 


The test statistic Q* has an approximate x2-distribution with (K — 1) degrees of freedom. 
4. Reject the Ho if Q? > iG, Ko 
5. Assumptions: e; > 5,j = 1,2,...,K. 


If the null hypothesis does not specify Fo(x) completely, that is, if Fo(x) contains some unknown 
parameters 6, 02, ... , 9p, we estimate these parameters by the method of maximum likelihood. Using 
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these estimated values we specify Fo (x) completely. Denote the estimated Fo(x) by Fo(x). Let 


fj = PX € IilFow)} and £; = ni. 


The test statistic is 


K 52 
(O; — ej) 
Ce) es 


The statistic Q* has an approximate chi-square distribution with (K — 1 — p) degrees of freedom. We 


reject Ho if OQ? > x2 (K_1_p): 


We now illustrate the method of goodness-of-fit with an example. 


—_?DS]_AT}A. $A AA 


Example 7.6.5 


The grades of students in a class of 200 are given in the following table. Test the hypothesis 
that the grades are normally distributed with a mean of 75 and a standard deviation of 8. Use 


a = 0.05. 
Range 0-59 | 60-69 | 70-79 | 80-89 | 90-100 
Number of students | 12 36 90 44 18 
Solution 
We have O; = 12, O2 = 36, 03 = 90, O4 = 44, O5 = 18. 
We now compute m;(i = 1, 2,..., 5), using the continuity correction factor, 


m = P{X <59.5|Ho} = Plz < 23-9} = 0.0262, 


2 = 0.2189, 173 = 0.4722, m4 = 0.2476, m5 = 0.0351, 


and 


E, = 5.24, E) = 43.78, E3 = 94.44, Eq = 49.52, Es = 7.02. 


The test statistic results in 


7 2 
(Oi — ei) 
a 
i=1 i: 
_ (12—5.74)? (36 — 43.78)*  (90— 94.44)? (44— 49.52)? (18 — 7.02)? 
«5,74 43.78 94.44 49.52 7.02 


= 26.22. 
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Q? has a chi-square distribution with (5 — 1) = 4 degrees of freedom. The critical value is Xe ox qg= 7. 
Hence, the rejection region is @2 > 7.11. Because the observed value of Q* = 26.22 > 7.11, we reject Ho 
at a = 0.05. Thus, we conclude that the population is not normal. 

= 


EXERCISES 7.6 


7.6.1. The following table gives the opinion on collective bargaining by a random sample of 200 
employees of a school system, belonging to a teachers’ union. 
Opinion on Collective Bargaining by Teachers’ Union 
For | Against | Undecided | Total 
Staff 30 15 15 60 
Faculty 50 10 40 100 
Administration | 10 25 5 40 
Column totals | 90 50 60 200 
Test the hypotheses 
Ho : Opinion on collective bargaining is independent of employee classification 
versus 
Hq : Opinion on collective bargaining is dependent on employee classification 
using a = 0.05. 

7.6.2. A random sample was taken of 300 undergraduate students from a university. The students 
in the sample were classified according to their gender and according to the choice of their 
major. The result is given in the following table. 

College 
Gender Arts and sciences Engineering Business Other Total 
Male 75 40 24 66 205 
Female 45 12 15 23 95 
Total 120 52 39 89 300 
Test the hypothesis that the choice of the major by undergraduate students in this university 
is independent of their gender. Use w = 0.01. 

7.6.3. The speeds of vehicles (in mph) passing through a section of Highway 75 are recorded for a 
random sample of 150 vehicles and are given below. Test the hypothesis that the speeds are 
normally distributed with a mean of 70 and a standard deviation of 4. Use a = 0.01. 

Range 40-55 | 56-65 | 66-75 | 76-85 | >85 
Number 12 14 78 40 6 
7.6.4. Based on the sample data of 50 days contained in the following table, test the hypothesis that 


the daily mean temperatures in the city are normally distributed with mean 77 and variance 
6. Use a = 0.05. 
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Temperature 


46-55 


56-65 | 66-75 


76-85 


86-95 


Number of days 4 


6 13 23 4 


7.6.5. <A presidential candidate advertises on TV by comparing his positions on some important 


issues with those of his opponent. After a series of advertisements, a pollster wants to know 
whether people have changed their opinion about the candidate. The following are the data 
based on a survey of 950 randomly chosen people: 


Before the Advertisement Was Shown 


Support the | Oppose the | Need to know more | Undecided 
candidate candidate | about the candidate 
40% 20% 5% 35% 


After the Advertisement Was Shown 


Support the | Oppose the | Need to know more | Undecided 
candidate candidate | about the candidate 
45% 25% 2% 28% 


Let p;, i = 1, 2,3, 4, represent the respective true proportions. 
Test 


Ho: py = 0.35; p2 = 0.20; p3 = 0.15; pg = 0.3 
versus 


H, : At least one of the probabilities is different from the hypothesized value. 
Test this hypothesis using a = 0.05. 


7.6.6. A survey of footwear preferences of a random sample of 100 undergraduate students (50 
females and 50 males) from a large university resulted in the following data. 


Boots Leather Sneakers Sandals Other 
shoes 
Female 12 9 12 10 7 
Male 10 12 17 Z 4 
(a) Let pj,i = 1,2,3,4,5, represent the respective true proportions of students with a 


particular footwear preference, and let 
Ho: pi = 0.20; p2 = 0.20; p3 = 0.30; pa = 0.20; p5 = 0.10 
versus 


H,: At least one of the probabilities is different from the hypothesized value. 


Test this hypothesis using a = 0.05. 
(b) Test the hypothesis that the choice of footwear by undergraduate students in this 
university is independent of their gender, using a = 0.05. 
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7.7 CHAPTER SUMMARY 


In this chapter, we have learned various aspects of hypothesis testing. First, we dealt with hypothesis 
testing for one sample where we used test procedures for testing hypotheses about true mean, true 
variance, and true proportion. Then we discussed the comparison of two populations through their 
true means, true variances, and true proportions. We also introduced the Neyman-—Pearson lemma 
and discussed likelihood ratio tests and chi-square tests for categorical data. 


We now list some of the key definitions in this chapter. 


Statistical hypotheses 

Tests of hypotheses, tests of significance, or rules of decision 
Simple hypothesis 

Composite hypothesis 

Type I error 

Type II error 

The level of significance 

The p-value or attained significance level 
The Smith-Satterthwaite procedure 
Power of the test 

Most powerful test 

Likelihood ratio 


In this chapter, we also learned the following important concepts and procedures: 


General method for hypothesis testing 

Steps to calculate 6 

Steps to find the p-value 

Steps in any hypothesis testing problem 

Summary of hypothesis tests for jz 

Summary of large sample hypothesis tests for p 

Summary of hypothesis tests for the variance o? 

Summary of hypothesis tests for 41 — 12 for large samples (n; & nz > 30) 
Summary of hypothesis tests for p; — p2 for large samples 
Testing for the equality of variances 

Summary of testing for a matched pairs experiment 

Procedure for applying the Neyman-Pearson lemma 

Procedure for the likelihood ratio test 

Testing the parameters of a multinomial distribution (summary) 
Testing the independence of two factors 

Goodness-of-fit test procedures for probability distributions 


7.8 COMPUTER EXAMPLES 


In the following examples, if the value of @ is not specified, we will always take it as 0.05. 
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7.8.1 Minitab Examples 


———— 
Example 7.8.1 
(t-Test): Consider the data 


66 74 79 80 69 77 78 65 79 81 


Using Minitab, test Ho : w = 75 vs. Hy : pw > 75. 


Solution 
Enter the data in C1. Then 


Stat > Basic Statistics > 1-sample t... > |n Variables: enter C1 > choose Test Mean > enter 75 > 
in Alternative: choose greater than and click OK 


We obtain the following output. 


T-Test of the Mean 
Test of mu = 75.00 vs mu > 75.00 


Variable N Mean StDev SE Mean T P 
Cl 10 74.80 6.00 1.90 —0.11 0.54 


EEE O_o 
Example 7.8.2 
For the following data: 


Sample1: 16 18 21 13 19 16 18 15 20 19 14 21 14 
Sample2: 14 15 10 13 11 7 12 11 12 15 14 


Test Ho : fy = M2 VS. A : 1, < 2. Useaw = 0.02. 
Solution 
Enter sample 1 data in C1 and sample 2 data in C2. Then 


Stat > Basic Statistics > 2-sample t... > Choose Samples in different columns > in Alternative: 
choose less than > in Confidence level: enter 98 > click Assumed equal variances and click OK 
We obtain the following output. 


Two Sample T-test and Confidence Interval 
Two sample T for Cl vs C2 
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N Mean StDev SE Mean 
Cl 13 17.23 2.74 0.76 
C2 11 12.18 2.40 0.72 


98% CI for mu Cl — mu C2: (2.38, 7.71) 
T-Test mu Cl = mu C2 (vs <): T = 4.75 P = 1.0 DF = 22 
Both use Pooled StDev = 2.59 


If we did not select Assumed equal variances, we will obtain the following output. 


Two Sample T-Test and Confidence Interval 
Two sample T for Cl vs C2 


N Mean StDev SE Mean 


Cl 13 17.23 2.74 0.76 
C2 11 12.18 2.40 0.72 


98% CI for mu Cl - mu C2: (2.40, 7.69) 
T-Test mu Cl = mu C2 (vs <): T = 4.81 P = 1.0 DF = 21 


I 


Example 7.8.3 
For the following data: 


68 56 85 85 84 75 93 94 78 7.1 
99 96 90 94 13.7 166 91 10.1 106 11.1 
89 11.7 12.8 11.5 12.0 106 11.1 64 12.3 12.3 
114 99 143 11.5 11.8 13.30 12.8 13.7. 13.9 12.9 
14.2 140 15.5 16.9 18.0 17.9 21.8 184 34.3 


Test Ho : ws = 12 versus Hy : ws # 12.Usea = 0.05. 
Solution 
Enter the data in C1. Then 


Stat > Basic Statistics > 1-sample z... > in Variables: Type C1 > choose Test Mean and enter 12 > 
choose not equal in Alternative, and Type 4.7 for sigma > Click OK 


We obtain the following output. 


Z-Test 
Test of mu = 12.000 vs mu not = 12.000 
The assumed sigma = 4.70 


Variable N Mean StDev SE Mean Zz P 
Cl 49 12.124 4.700 0.671 0.19 0.85 


Here the test statistic is 0.19 and the p-value is 0.85, which is larger than 0.05. Hence, we cannot reject the 
null hypothesis. 
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————————,,,,,,,SSSeee 
Example 7.8.4 
(Contingency Table): Consider the following data with five levels and two factors. Test for dependence 


of the factors. 
Factors Levels 
1 2 3 4 5 
1 39 19 12 28 18 
2 172 61 44 70 37 
Solution 


In C1 enter the data in column 1 (39 and 172), and continue to C5. Then 


Stat > Tables > Chi-Square-Test. .. > in Columns containing the table: Type C1 C2 C3 C4 C5 > 
click OK 


We will obtain the following output. 


Chi-Square Test 
Expected counts are printed below observed counts 


Cl C2 C3 C4 C5 Total 
1 39 19 2 28 8 116 
48.95 18.56 12.99 22.74 12.76 
2 172 61 44 70 37 384 
162.05 61.44 43.01 75.26 42.24 
Total 211 80 56 98 55 500 


Chi-Sq = 2.023 + 0.010 + 0.076 + 1.219 + 2.152 + 
0.611 + 0.003 + 0.023 + 0.368 + 0.650 = 7.135 


DF = 4, p-value = 0.129 


———OOOOOOOOOOOOOOO::.0 nn nn a —SS—__eeeeeeee 
Example 7.8.5 
(Paired t-Test): Consider the data of Example 7.5.7. Using Minitab, perform a paired f-test. 


Solution 
Enter sample 1 in column C1 and sample 2 in column C2. Then: 


Stat > Basic Statistics > Paired t... > in First Sample: Type C2, and in the Second sample: Type 
C1 > click options > and click less than (if w is other than 0.05, enter appropriate percentage in 
Confidence level: and enter appropriate number if it is not zero in Test mean:) > click OK > OK 
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We obtain the following output. 


Paired T-test and Confidence Interval 


Paired T forC2—Cl 


N Mean StDev SE Mean 
C2 10 171.3 47.1 14.9 
Cl 10 243.2 40.1 12.7 
Difference 10 -—71.9 56.2 17.8 


95% CI for mean difference: (—112.1, —31.7) 
T-Test of mean difference =0 (vs <0): T-Value =—4.05 
p-value =0.001 


because the p-value 0.001 < 0.05 = a. 


7.8.2 SPSS Examples 


a ———_—_—_—_————— 


Example 7.8.6 
Consider the data 


66 74 79 80 69 77 78 65 79 81 
Using SPSS, test Ho : uw = 75 vs. HH, : > 75. 


Solution 
Use the following procedure: 


1. Enter the data in column 1. 


2. Click Analyze > Compare Means > One-sample t Test... , Move var00001 to Test Variable(s), 
and change Test Value: 0 to 75. Click OK 


We obtain the following output. 
One-Sample Statistics 


Std. Error 
N Mean | Std. Deviation Mean 
VAROOOO01 | 10 | 74.8000 5.99630 1.89620 
One-Sample Test 
Test Value = 75 
95% Confidence 
Interval of the 
Sig. Mean Difference 
t df | (2-tailed) | Difference | Lower | Upper 
VAROOO01 | —.105 | 9 918 —.2000 | —4.4895 | 4.0895 


For the one sample t-test Ho : 4 = 75 vs. Hy : « > 75, the t-statistic is —0.105 with 9 degrees of freedom. 
The p-value is 0.46 > 0.02. Hence, we will not reject the null hypothesis. 
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If we want the computer to calculate the p-value in the previous example, use the following procedure. 


1. Enter the test statistic (—0.105) in the data editor using ‘teststat’. 

2. Click Transform > compute... 

3. Type ‘p-value’ in the box called Tarobtain value. |n the box called Functions: scroll and click on 
CDF.T(q,df) and move to Numeric Expressions. 

4. The CDF(q,df) will appear as CDF(?,2) in the Numeric Expressions box. Replace teststat for q and 9 
for df (the degree of freedom in this example is 9). Click OK 


We obtain the p-value as 0.46. 
OOo. nn eee EE 
Example 7.8.7 
For the following data 
Sample1: 16 18 21 13 19 16 18 15 20 19 14 21 14 
Sample2: 14 15 10 13 11 7 #12 =#11 #12 «#15 14 


Test Ho : 41 = 2 VS. Ay : Wy < 2. Usea = 0.02. 


Solution 
In column 1, under the title “group” enter 1s to identify the sample 1 data and 2s to identify sample 2 data. 
In column C2, under the title “data” enter the data corresponding to samples 1 and 2. Then: 


Analyze > Compare Means > Independent Samples t-test... > bring Data to Test Variable(s): and 
group to Grouping Variable:, click Define Groups... , and enter 1 for sample 1, 2 for sample 2 > 
click continue > click Options... . Enter 98 in Confidence interval: > click continue > OK 


We obtain the following output. 


Group Statistics 
GROUP | N Mean Std. Deviation | Std. Error Mean 


DATA 1.00 13 | 17.2308 2.74329 .76085 
2.00 11) 12.1818 2.40076 12386 
Independent Samples Test 
Levene's Test t-test for 
for Equality Equality 
of Variances of Means 
F Sig. i df Sig. Mean __ | Std. Error ]98% Confidence 
(2-tailed)|Difference|Difference| Interval of the 
Difference 
Lower Upper 
DATA|Equal variances 275 334) 4.753 22 .000 5.0490 1.06237 2.38419 7.71372 
assumed 
4.808 |21.963} .000 5.0490 1.05017 2.41443 7.68347 
Equal 
variances 
not 
assumed 
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Looking at the statistical significance values, which are greater than 0.05, we do not reject the null 
hypothesis. 
= 
—o—oOoeoeoerererererererererererereeeeeeeeeeeee———...—n nn aE 
Example 7.8.8 
(Paired t-Test) For the data of Example 7.5.7, use SPSS to test whether the data provide sufficient 
evidence for the claim that the new program reduces blood glucose level in diabetic patients. Use a = 0.05. 


Solution 
Enter after data in column C1 and before data in column C2. Then: 


Analyze > Compare Means > Paired-Sample T-Test > bring after and before to Paired Variables: 
so that it will look after-before > click OK 


We obtain the following output. 


Paired Samples Statistics 

Mean N | Std. Deviation | Std. Error Mean 
Pair 1) AFTER | 171.3000 | 10 47.11228 14.8982] 
BEFORE | 243.2000 | 10 40.12979 12.69015 


Paired Samples Correlations 


N | Correlation | Sig. 
Pair 1 | AFTER & BEFORE | 10 179 621 


Paired Samples Test 


Paired t df 
Differences Sig. 
(2-tailed) 
Std. Std. Error 
Mean Deviation |Mean 95% Confidence 
Interval of the 
Difference 
Upper 
Lower 
Pair 1|AFTER --|—71.9000 17.75791 = |—112.0712 —31.7288 | —4.049| 9 
BEFORE 56.15544 .003 


Because the significance level for the test is 0.003, which is less than a = 0.05, we reject the null hypothesis. 
= 


7.8.3 SAS Examples 


To conduct a hypothesis test using SAS, we could use proc ttest, or proc means with option of 
computing the t-value and corresponding probability. However, to use this, we need a hypothesis 
of the form Ho : wu = O. For testing nonzero values, Hp : 4 = 40, we must create a new variable 
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by subtracting zo from each observation, and then use the test procedure for this new variable. The 
following example illustrates this concept. 


eee 


Example 7.8.9 


(t-Test): The following radar measurements of speed (in miles per hour) are obtained for 10 vehicles 
traveling ona stretch of interstate highway. 


66 74 79 80 69 77 78 65 79 81 


Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this 
stretch of highway is at least 75 mph? Test using a = 0.01. Use an SAS procedure to do the analysis. 


Solution 


In the SAS editor, type in the following commands. 


data speed; 


title ’Test on highway speed’; 


input X @@; 
Y=X-75; 
datalines; 


6 74 78 80 69 77 JS &S 79 BL 


PROC TTEST data=speed; 
run; 


We obtain the following output. 


Test on highway speed 


The TTEST Procedure 


Statistics 


Lower CL 
Variable N Mean Mean 
X 10 70.511 74.8 
Y 10 —-4.489 —0.2 


Upper CL 
Mean Std 
Dev 
79.089 4.1245 
4.0895 4.1245 
Variable 
X 
Y 


Lower 
CL 

Std 

Dev 
5.9963 
5.9963 
T-Tests 
DF 

9 

9 


Upper 

CL 

Std Std 
Dev Err 


10.947 1.8962 
10.947 


t Value Pr>|t| 
39.45 <.0001 
-0.11 0.9183 


To test Ho : uw = 75, we need to look at the Y-values. The corresponding t-value is —0.11, and because this 
is a one-sided test, we need to divide 0.9183 by 2 to obtain the p-value as p = 0.45915. Because the p-value 
is larger than 0.01 = a, we cannot reject the null hypothesis. 
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One of the easier ways to conduct large sample hypothesis testing using SAS procedures is through 
the computation of the p-value. The following example illustrates the procedure. 


—e—e—enrnrnrererererererereeeeeeeeeeeeeeeeeee-...”——n—n nn —O OO ae 
Example 7.8.10 
(z-Test): It is claimed that the average miles driven per year for sports cars is at least 18,000 miles. To check 
the claim, a consumer firm tests 40 of these cars randomly and obtains a mean of 17,463 miles with standard 
deviation of 1348 miles. What can it conclude if a = 0.01? 


Solution 
Here we will find the p-value and compare that with a to test the hypothesis. We use the following SAS 


procedure: 


Data ex888; 

z=(17463-18000)/(1348/(SQRT(40))); 
pval=probnorm(z); 

run; 

proc print data=ex888; 

title ’Test of mean, large sample’; 

run; 


We obtain the following output. 


Test of mean, large sample 
Obs Zz pval 
1 2.51950 .005876079 


Because the p-value of 0.005876079 is less than a = 0.01, we reject the null hypothesis. There is sufficient 


evidence to conclude that the mean miles driven per year for sport cars is less than 18,000. 
= 


Note that in the previous example, the value of z was negative. If the value of z is positive, use 
pval=probnorm(-z);, also, if it is a two-sided hypothesis, we need to multiply by 2, so use 
pval=probnorm(z)*2; to obtain the p-value. 


oon, 


Example 7.8.11 
(Paired t-Test): For the data of Example 7.5.7, use SAS to test whether the data provide sufficient evidence 
for the claim that the new program reduces blood glucose level in diabetic patients. Use w = 0.05. 


Solution 
We can use the following commands. 
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data dietexr; 
input before after; 
diff = after - before; 


datalines; 
268 106 
225 186 
252 223 
192 110 
307 20S 
228 101 
BANS 21l Il 
298 176 
23 19a 
185 203 
run; 
proc means data=dietexr t prt; 
var Gliiirs 
title ’Test of mean, Paired difference’ ; 
run; 


We obtain the following output. 


Test of mean, Paired difference 
The MEANS Procedure 
Analysis Variable : diff 

t Value Pr > Itl 

—4.05 0.0029 


Because the p-value 0.0029 is less than a = 0.05, we reject the null hypothesis. 


PROJECTS FOR CHAPTER 7 


7A. Testing on Computer-Generated Samples 
(a) Small sample test: 
Generate a sample of size 20 from a normal population with 2 = 10, and o? = 4. 
(i) Perform a t-test for the test Ho : 4 = 10 versus Hg : uw # 10 at level a = 0.05. 
(ii) Perform the test Ho : 0? = 4 versus Hy : 07 4 4 at level a = 0.05. 
Repeat the procedure 10 times, and comment on the results. 
(b) Large sample test: 
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Generate a sample of size 50 from a normal population with ~ = 10, and o? = 4. Perform a z-test 
for the test Hp : w = 10 versus H, : uw 4 10 at level a = 0.05. Repeat the procedure 10 times and 
comment on the results. 


7B. Conducting a Statistical Test with Confidence Interval 
Let 6 be any population parameter. Consider the three tests of hypotheses 


Hp :0=0o9 vs. Ha: 0> 00 (1) 
Ho :90= 09 vs. Ha: @ < 09 (2) 
Ho :0= 90 vs. Ha: 64 % (3) 


The following procedure can be exploited to test a statistical hypothesis utilizing the confidence 
intervals. 


Procedure to Use Confidence Interval for Hypothesis Testing 
Let 6 be any population parameter. 
(a) For test (1), that is, 


Hp :0=06o vs. Ha: 0> 00 


choose a value for a. From a random sample, compute a confidence interval for 6 using 
a confidence coefficient equal to 1 — 2a. Let L be the lower end point of this confidence 
interval. 


Reject Ho if @) < L. 
That is, we will reject the null hypothesis if the confidence interval is completely to the right 


of A. 
(b) For test (2), that is, 


Ho :8= 00 vs. Ha: 0 < 09 


choose a value for a. From a random sample, compute a confidence interval for 6 using 
a confidence coefficient equal to 1 — 2a. Let U be the upper end point of this confidence 
interval. 


Reject Ho if U < 4. 


That is, we will reject the null hypothesis if the confidence interval is completely to the 
left of Op. 
(c) For test (3), that is, 


Ho :6= 069 vs. Ha: 04 8 
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choose a value for a. From a random sample, compute a confidence interval for 6 using a 
confidence coefficient equal to 1 — a. Let L be the lower end point and U be the upper end 
point of this confidence interval. 


Reject Ho if@) < L orU < 6. 


That is, we will reject the null hypothesis if the confidence interval does not contain 60. 
(i) For any large data set, conduct all three of these hypothesis tests using a confidence 
interval for the population mean. 
(ii) For any small data set, conduct all three of these hypothesis tests using a confidence 
interval for the population mean. 


Chapter 


Linear Regression Models 


Objective: In this chapter we will study linear relationships in sample data and use the method of 
least squares to estimate the necessary parameters. 
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English scientist Sir Francis Galton (1822-1911), a cousin of Charles Darwin, made significant 
contributions to both genetics and psychology. He is the inventor of regression and a pioneer in 
applying statistics to biology. One of the data sets that he considered consisted of the heights of 
fathers and first sons. He was interested in predicting the height of son based on the height of father. 
Looking at the scatterplots of these heights, Galton saw that the trend was linear and increasing. After 
fitting a line to these data (using the techniques described in this chapter), he observed that for fathers 
whose heights were taller than the average, the regression line predicted that taller fathers tended to 
have shorter sons and shorter fathers tended to have taller sons. There is a regression toward the mean. 
That is how the method of this chapter got its name: regression. 


8.1 INTRODUCTION 


In earlier chapters, we were primarily concerned about inferences on population parameters. In this 
chapter, we examine the relationship between one or more variables and create a model that can be 
used for predictive purposes. For example, consider the question “Is there statistical evidence to con- 
clude that the countries with the highest average blood-cholesterol levels have the greatest incidence 
of heart disease?” It is important to answer this if we want to make appropriate lifestyle and medical 
choices. We will study the relationship between variables using regression analysis. Our aim is to cre- 
ate a model and study inferential procedures when one dependent and several independent variables 
are present. We denote by Y the random variable to be predicted, also called the dependent variable 
(or response variable) and by x; the independent (or predictor) variables used to model (or predict) Y. 
For example, let (x, y) denote the height and weight of an adult male. Our interest may be to find the 
relationship between height and weight from a sample measurements of n individuals. The process 
of finding a mathematical equation that best fits the noisy data is known as regression analysis. In his 
book Natural Inheritance, Sir Francis Galton introduced the word regression in 1889 to describe certain 
genetic relationships. The technique of regression is one of the most popular statistical tools to study 
the dependence of one variable with respect to another. There are different forms of regression: simple 
linear, nonlinear, multiple, and others. The primary use ofa regression model is prediction. When using a 
model to predict Y for a particular set of values of x1, ... , xx, one may want to know how large the error 
of prediction might be. Regression analysis, in general after collecting the sample data, involves the 
following steps. 


PROCEDURE FOR REGRESSION MODELING 
1. Hypothesize the form of the model as Y = f(x1,...,Xx; 
Bo, B1,---, Bk) + &. Here ¢ represents the random error term. We assume that E(e) = 0 but 
Var(e) = o2 is unknown. From this we can obtain E(Y) = f(x1,...,Xi Bor B1,-- +1 Bk)» 
2. Use the sample data to estimate unknown parameters in the model. 
. Check for goodness of fit of the proposed model. 
4. Use the model for prediction. 


w 
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The function f(x1,..., xx; Bo, B1,---, Bx)(k = 1) contains the independent or predictor variables 
X1,...,Xn (assumed to be nonrandom) and unknown parameters or weights fo, 61,..., By and € 
representing the random or error variable. We now proceed to introduce the simplest form of a 
regression model, called simple linear regression. 


8.2 THE SIMPLE LINEAR REGRESSION MODEL 


Consider a random sample of n observations of the form (x1, y1), (%2, y2), ---, (Xn, Yn), Where X is the 
independent variable and Y is the dependent variable, both being scalars. A preliminary descriptive 
technique for determining the form of relationship between X and Y is the scatter diagram. A scatter 
diagram is drawn by plotting the sample observations in Cartesian coordinates. The pattern of the 
points gives an indication of a linear or nonlinear relationship between the variables. 


In Figure 8.1a, the relationship between x and y is fairly linear, whereas the relationship is somewhat 
like a parabola in Figure 8.1b, and in Figure 8.1c there is no obvious relationship between the variables. 


Once the scatter diagram reveals a linear relationship, the problem then is to find the linear model 
that best fits the given data. To this end, we will first give a general definition of a linear statistical 
model, called a multiple linear regression model. 


(a) Linear relationship (b) Quadratic relationship 


(c) No relationship 


W@ FIGURE 8.1 Scatter diagram. 
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Definition 8.2.1 A multiple linear regression model relating a random response Y to a set of predictor 
variables x1, ..., xx is an equation of the form 


Y = Bo + B1x1 + Bax2 +--- + Bexgp te 


where Bo,..., Be are unknown parameters, x1, ..., xX, are the independent nonrandom variables, and ¢ is a 
random variable representing an error term. We assume that E(e) = 0, or equivalently, 


E(Y) = Bo + Bix1 + Box2 + +++ + Berg. 


To understand the basic concepts of regression analysis we shall consider a single dependent variable 
Y and a single independent nonrandom variable x. We assume that there are no measurement errors 
in x;. The possible measurement errors in y and the uncertainties in the assumed model are expressed 
through the random error e. Our inability to provide an exact model for a natural phenomenon is 
expressed through the random term ¢, which will have a specified probability distribution (such as a 
normal) with mean zero. Thus, one can think of Y as having a deterministic component, E(Y), and 
a random component, «. If we take k = 1 in the multiple linear regression model, we have a simple 
linear regression model. 


Definition 8.2.2 If Y = Bo + Bix +6, this is called a simple linear regression model. Here, Bo is the 
y-intercept of the line and B, is the slope of the line. The term e is the error component. 


This basic linear model assumes the existence ofa linear relationship between the variables x and y that 
is disturbed by a random error ¢. The known data points are the pairs (x1, y2), (x2, y2),---, (Xn. Yn)i 
the problem of simple linear regression is to fit a straight line optimal in some sense to the set of 
data, as shown in Figure 8.2. 


—5 0 5 10 15 


W FIGURE 8.2 Scatterplot and least-squares regression line. 
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Now, the problem becomes one of finding estimators for Bo and 61. Once we obtain the “good” 
estimators Bp and fj, we can fit a line to the data given by the prediction equation ¥ = By + Aix. 
The question then becomes whether this predicted line gives the “best” (in some sense) description 
of the data. We now describe the most widely used technique, called the method of least squares, to 
obtain the estimators or weights of the parameters. 


8.2.1 The Method of Least Squares 


As stated (x1, y1), (x2, y2),---,(%n, Yn) are the n observed data points, with corresponding errors 
€;,i=1,...,n. That is, 


Y;= fot Bixite;, i= 1,2,...,n. 


We assume that the errors ¢;,i = 1,..., are independent and identically distributed with E(e;) = 
0,i = 1,...,n, and Var(e;) = o?,i = 1,...,n. One of the ways to decide on how well a straight 
line fits the set of data is to determine the extent to which the data points deviate from the line. The 
straight line model for the response Y for a given x is 


Y= Bo + fix +e. 
Because we assumed that E(e) = 0, the expected value of Y is given by 
E(Y) = Bo + Bix. 


The estimator of the E(Y), denoted by Y, can be obtained by using the estimators By and f; of the 
parameters Bo and £), respectively. Then, the fitted regression line we are looking for is given by 


¥ = Bo + Bix. 
For observed values (x;, y;), we obtain the estimated value of y; as 
5 = Bo + B1xi- 
The deviation of observed y; from its predicted value 4;, called the ith residual, is defined by 
e=(y-Si)= [>i - (Bo + Aix) : 


The residuals, or errors e;, are the vertical distances between observed and predicted values of y;s 
(Figure 8.3). 


W FIGURE 8.3 Illustration of e;. 
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Definition 8.2.3 The sum of squares for errors (SSE) or sum of squares of the residuals for all of the n data 
points is 
n n 2 
SSE = "et = [yi - (Bo + Bix) | 
i=1 


i=1 


The least-squares approach to estimation is to find By and A, that minimize the sum of squared 
residuals, SSE. Thus, in the method of least squares, we choose fp and £; so that SSE is a minimum. 
The quantities By and f£; that make the SSE a minimum are called the least-squares estimates of the 
parameters Bo and £1, and the corresponding line § = Bo + A 1x is called the least-squares line. 


Definition 8.2.4 The least-squares line } = Bo + 1x is one that satisfies the following property: 

n 

SSE =~ (yi — 51)? 
i=1 
is a minimum for any other straight line model with 
n 
SE=)° (yi —3;) =0 
i=1 


Thus, the least-squares line is a line of the form y = bo + b,x for which the error sum of squares 
37.10; — bo — bx)? is a minimum. The minimum is taken over all values of bo and bj, and 
(x1, y1), (X2, y2),--+, (Xn, Yn) are observed data pairs. 


The problem of fitting a least-squares line now reduces to finding the quantities By and A, that 
minimize the error sum of squares. 


8.2.2 Derivation of Bo and By 


Now we derive expressions for By and A}. If SSE attains a minimum, then the partial derivatives of 
SSE with respect to Bo and f; are zeros. That is, 


a Sie asor| 
dSSE [i=l 


Bo Bo 


=—-)°2[% - (0+ 61x11 (8.1) 
i=1 


=2 (> —nBo — Bi ys] =0 
i=1 i=1 


and 


a [yi — (Bo + sor | 
assE [ia 
apr Op1 
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(8.2) 


=— 52 [i — (o + Bix) 
i=l 
=-2(Yys- fo om A owt) =o 
i=1 i=1 i=1 


Equations (8.1) and (8.2) are called the least squares equations for estimating the parameters of a line. 
From (8.1) and (8.2) we obtain a set of linear equations called the normal equations, 


n n 
oy = nfo +t Bi be (8.3) 
i=1 i=1 
and 
n n n 
> ivi = Bo > xi + B1 > 27- (8.4) 
i=1 i=1 =i 
Solving for Bo and f; from Equations (8.3) and (8.4), we obtain 
n n n n n a Xi 2, i 
YL &-DOW-Y) nV xvn- Vx Dd yi xy -S 
oe i=1 i=1 i=1 i=1 i=1 
Bee a ee ay. (8.5) 
(xj — X) ae ge - n 6 x) 
— & : & pa eee, 
i=1 
(8.6) 


and 


To simplify the formula for 1, set 


2 
«| : 
i=1 = 
ee Sxy = > m1 = 
i=1 


” ( 
Sxx = >> x? = 
i=1 


we can rewrite (8.5) as 


It can be shown (by using the second derivatives) that (8.5) and (8.6) do indeed minimize SSE. Now 


we will summarize the procedure for fitting a least-squares line. 
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PROCEDURE FOR FITTING A LEAST-SQUARES LINE 
1. Form the n data points (x7, y1),(x2, y2),.--,(Xn, Yn), and compute the following quantities: 
ey Xi ey XP, Py Vi, D1 y2, and 77_, xjyj. Also compute the sample means, 


¥ = (1/n) Yij_y x andy = (1/n) 7_y i- 
2. Compute 


fa 2 
= : _ (E+) = Y Ae 
Sux = 0x4 “4 = (5; x) 
i" 


and 


— 
TMs 


“)&) =P 6-9), 


n 
| 
Sxy = y XiYi 5 
f=1 


i= 


3. Compute Bp and ; by substituting the computed quantities from step 1 into the 


equations 
fy Ss 
Bi = ae 
XX 
and 
Bo = — Aix 


4. The fitted least-squares line is 


9 = Bo + Bix. 


5. Fora graphical representation, in the xy-plane, plot all the data points and draw the least-squares 
line obtained in step 4. 


Once we have accomplished the best-fit combination of the two parameters Bo and fj, any 
deviation of either parameter away from its optimum value will cause the sum of squares 
error to increase. Thus, the optimum combination of the pairs (fo, 81) forms a global mini- 
mum point of the error sum of squares among all possible values of Bp and 6, for the given 
data set. 
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Le 
Example 8.2.1 
Use the method of least squares to fit a straight line to the accompanying data points. Give the estimates 
of Bo and £1. Plot the points and sketch the fitted least-squares line. The observed data values are given in 
the following table. 


Solution 
Form a table to compute various terms 


zi Yi xii x 
-1 -5 5 1 
0 —4 0 0 
2 2 4 4 
-2 -7 14 4 
5 6 30 25 
9 54 36 
8 13 104 64 
in 21 231 121 
12 20 240 144 
-3 -9 27 9 

dx = 38 | Ny =46 | Yo xiv; = 709 | ox? = 408 


709 = 534.2 
10 


“ (z “| (z 7 (38)(46) 
Sxy — xi = 
i=1 


x¥=3.8 and y=4.6. 


Therefore, 


Sy 534.2 
== 0266 


Pla = 363.6 
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—5R 4 


—5 0 5 10 15 


W@ FIGURE 8.4 Simple regression line. 


and 


Bo = ¥ — Bix 
= 4.6 — (2.0266)(3.8) = —3.1011. 


Hence, the least-squares line for these data is 
§ = Bo + Bix = —3.1011 + 2.0266x 


and its plot is shown in Figure 8.4. 
Recall that for the regression line } = Bo + 61x. we have defined SSE to be 
n 2 n : hs 2 
SSE = > (91-51)? = DO (v1 - Bo - Basi) 
i=1 i=1 


We now show that 


n n 
SSE = Syy — BiSxy, where Syy = Yo = = > (yi -— y)?. 
i=1 


i=1 
We know that 


sse= °(n-Po— Ain) 


i=1 


= 3 (vi —~¥+ pix- Baxi)” 


i=1 
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-> [o:-»- Aw —D] 


i=1 


=o 01-9)? + Bf i - 8) - 221 0 -—DOI-D 


i=1 i= i=l 
= Syy + B7 Sixx — 281 Sxy. 


Recall that 8; = ee. 


Substituting for 6, we obtain 


8.2.3 Quality of the Regression 


Once we obtain the linear model, the question is, How well does this line fit the data? We could make 
use of the residuals 


ej = yi — Bo — BiXi 


to answer the question and to assess the quality of the fit. If our model is good, then the residual é; 
should be close to the random error ¢ with mean zero. Furthermore, the residuals should contain 
little or no information about the model, and there should be no recognizable pattern. If we plot the 
residuals versus the independent variables on the x-axis, ideally, the plot should look like a horizontal 
blur, the residuals showing no relationship to the x-values, as shown by Figure 8.5. Otherwise, these 
plots reveal a not very good fit of the given data, as shown by Figure 8.6, and we need to improve our 
model specifications. Thus, a symmetric trend in the plot of residuals e; versus x; or ¥j(i = 1,...,7) 
indicates that the assumed regression model is not correct. 


Mi FIGURE 8.5 Good fit. 
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“<> 


Wi FIGURE 8.6 Not a good fit. 


Whereas the residual plots give us a visual representation of the quality of fit, a numerical measure 
of how well the regression explains the data is obtained by calculating the coefficient of determination, 
also called the R* of the regression. This is discussed in Project 8B. Regression analysis with any 
of the standard statistical software packages will contain an output value of the R?. This value will 
be between O and 1; closer to 1 means a better fit. For example, if the value of R? is 0.85, the 
regression captures 85% of the variation in the dependent variable. This is generally considered good 
regression. 


8.2.4 Properties of the Least-Squares Estimators for the Model 

Y= fo+Pixte 
We discussed in Chapter 4 the concept of sampling distribution of sample statistics such as that of 
X. Similarly, knowledge of the distributional properties of the least-squares estimators Bp and fp; is 
necessary to allow any statistical inferences to be made about them. The following result gives the 
sampling distribution of the least-squares estimators. 


Theorem 8.2.1 Let ¥Y = Bo + Bix +e be a simple linear regression model with e ~ N(0, 07), and let the 
errors €; associated with different observations y;(i = 1,..., N) be independent. Then 


(a) Bo and B, have normal distributions. 
(b) The mean and variance are given by 


E (Bo) = Bo, Var (Bo) = (: + =) o?, 
and 
E (1) = 61, Var(B1) a a 


2 

n 1 n A A 

where Syx = >> a —— (= x) . In particular, the least-squares estimators By and 6, are unbiased 
i=1 nN \i=1 


estimators of Bo and f1, respectively. 
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Proof. We know that 
_ Sy 


1= 
F Sxx 


1 n 
=5 > @ x) (¥; — Y) 
XX py 


7 Sxx 


1 n 
=— Do ai-vY; 
Six i=1 


ba @i- DY -Y 0 Gi ») 
i=1 i=1 


i=l] 


where the last equality follows from the fact that }° (x; — x) = > x; —nx = 0. Because Y; is normally 
i i=l 


n 


‘6 is 1 ; 
distributed, the sum — )° (x; — X)¥; is also normal. Furthermore, 


XX [j=1 


: 1 
E(B) = <— ) i — HELM] 
XX i=1 


1 n 
= 5 i — Bo + B14) 
XX i=1 


po 1, bro 3. 
ae 2 + 2 ¥)Xj 


1 
= Bi =— y (xj — X)Xx; 
By wy a i i 


= pi 
Six i=1 i=1 
n 
1 n n ; ov 
2 i= 
-at|4-(S») | 
i=l i=1 
Bs 
n 
1 . (+) 
= — 2_ -_ 
As d*1 n 
i=1 
= P1—Sex = B 
= oe xx = P1 
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For the variance we have, 


Var [Ar| = Var E = (xj — X) ‘| 
oil 


1 n 
x > (x; — x)? Var [Yi] (since the Y;’s are independent) 


XX j=] 


es > G3) (Var (¥;) = Var (Bo + B1 + €;) = Var (ej) = 0”) 


we 7=1 


o2 


Sry’ 


Note that both Y and A; are normal random variables. It can be shown that they are also independent 
(see Exercise 8.3.3). Because By = ¥Y — 1X is a linear combination of Y and fj, it is also normal. 
Now, 


E| Po| = EY = piz| = E[Y]- xE| 61 


Lig ge 
ein | ¥B1 = — ) | (Bo + Bix) — 361 


i=1 


= Bo + XB — XB = Bo. 
The variance of Ao is given by 
Var] Bo | = Var] ¥ - ed 


= Var[Y]+ x2 Var| B 1| (since Y and fj are independent) 


ll 
=], 
+ 
wal 
19 
= N 
ll 
—— 
Sle 

nA 
| 
~~ 
Q 
N 


If an estimator 0 is a linear combination of the sample observations and has a variance that is less than 
or equal to that of any other estimator that is also a linear combination of the sample observations, 
then @ is said to bea best linear unbiased estimator (BLUE) for 0. The following result states that among all 
unbiased estimators for Bp and 6; which are linear in Y;, the least-square estimators have the smallest 
variance. 


GAUSS-MARKOV THEOREM 

Theorem 8.2.2 Let Y = Bo + 61x +¢ be the simple regression model such that for each x; fixed, each Y; is an 
observable random variable and each ¢ = ¢;,i= 1, 2,...,n is an unobservable random variable. Also, let the 
random variable ¢; be such that E[e;] = 0, Var(e;) = o2 and Cov(e;, € j) = 0, ifi A j. Then the least-squares 
estimators for By and B, are best linear unbiased estimators. 
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It is important to note that even when the error variances are not constant, there still can exist unbiased 
least-square estimators, but the least-squares estimators do not have minimum variance. 


8.2.5 Estimation of Error Variance o2 


The greater the variance, o”, of the random error ¢, the larger will be the errors in the estimation of 
model parameters fp and 6. We can use already-calculated quantities to estimate this variability of 
errors. It can be shown that (see Exercise 8.2.1(b)) that 


E(SSE) = (n — 2)o?. 
Thus, an unbiased estimator of the error variance, 07, is 62 = (SSE)/(n — 2). We will denote (SSE) / 


(n — 2) by MSE (Mean Square Error). 


EXERCISES 8.2 


8.2.1. Fora random sample of size n, 
(a) Show that the error sum of squares can be expressed by 


SSE = Syy — Bi Sxy. 


(b) Show that E[SSE] = (n — 2)o?. 


8.2.2. The following are midterm and final examination test scores for 10 students from a calculus 
class, where x denotes the midterm score and y denotes the final score for each student. 


68 | 87 | 75 | 91 | 82 | 77 | 86 | 82 | 75 | 79 
74 | 79 | 80 | 93 | 88 | 79 | 97 | 95 | 89 | 92 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.3. The following data give the annual incomes (in thousands of dollars) and amounts (in 
thousands of dollars) of life insurance policies for eight persons. 


Annual income | 42 | 58 | 27| 36] 70 | 24] 53 | 37 
Life insurance 150 | 175 | 25 | 75 | 250 | 50 | 250 | 100 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.4. Consider a simple linear model Y = Bo + Bix + €, with ¢ ~ N(0, 07). Show that 


n 
—o? > x; 
i=1 


cov(Bo, B1) = 
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8.2.5. (a) Show that the least-squares estimates of Bo and f; of a line can be expressed as 


Pa 


Bo =¥— Bix 
and 
2 (xj — x) Gi — Y) 
b= = = 
> (%j -— x)? 
i=1 


(b) Using part (a), show that the line fitted by the method of least squares passes through 
the point (x, y). 


8.2.6. Crickets make their chirping sounds by rapidly sliding one wing over the other. The faster 
they move their wings, the higher the number of chirping sounds that are produced. Scien- 
tists have noticed that crickets move their wings faster in warm temperatures than in cold 
temperatures (they also do this when they are threatened). Therefore, by listening to the 
pitch of the chirp of crickets, it is possible to tell the temperature of the air. The following 
table gives the number of cricket chirps per 13 seconds recorded at 10 different temperatures. 
Assume that the crickets are not threatened. 


Temperature 60 | 66 | 70 | 73 | 78 | 80 | 82 | 87 | 90 | 92 
Number of chirps | 20 | 25 | 31 | 33 | 36 | 39 | 42 | 48 | 49 | 52 


Calculate the least-squares regression line for these data and discuss its usefulness. 


8.2.7. Consider the regression model 


y=Bix+e 
where ¢ ~ N(0, 02). Show that 
n 
Ze XiYi 
bi, = = 
Sa? 
i=1 


8.2.8. A farmer collected the following data, which show crop yields for various amounts of 
fertilizer used. 


Fertilizer (pounds/100 sq. ft) | 0/4) 8 | 10] 15 | 18) 20} 25 
Yield (bushels) 6/7] 10| 13) 17 | 18) 22 | 23 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.9. An economist desires to estimate a line that relates personal disposable income (DI) to 
consumption expenditures (CE). Both DI and CE are in thousands of dollars. The following 
gives the data for a random sample of nine households of size four. 
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DI | 25 | 22 | 19 | 36 | 40 | 47 | 28 | 52 | 60 
CE | 21 | 20 | 17 | 28 | 34 | 41 | 25 | 45 | 51 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.10. The following data represent systolic blood pressure readings on 10 randomly selected 
females between ages 40 and 82. 


Age (x) 63 | 70 | 74 | 82 | 60 | 44] 80 [ 71 | 71 | 41 
Systolic (y) | 151 | 149 | 164 | 157 | 144 | 130 | 157 | 160 | 121 | 125 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.11. Itisbelieved that exposure to solarradiation increases the pathogenesis of melanoma. Suppose 
that the following data give sunspot relative number and age-adjusted total incidence (inci- 
dence is the number of cases per 100,000 population) for 8 different years in a certain region. 


Sunspot relative number | 104 | 12 | 40 | 75 | 110 | 180 | 175 | 30 
Incidence total 4.7/1.9 | 3.8| 2.9] 0.9 | 2.7 | 3.9 | 1.6 


(a) Calculate the least-squares regression line for these data. 
(b) Plot the points and the least-squares regression line on the same graph. 


8.2.12. It is believed that the average size of a mammal species is a major factor in the period 
of gestation (the period of development in the uterus from conception until birth). In 
general, it is observed that the bigger the mammal is, the longer the gestation period. 
Table 8.2.1 gives adult mass in kilograms and gestation period in weeks of some species 
(source: http://www.saburchill.com/chapters/chap0037.html). 


Table 8.2.1 
Species Adult mass_ Gestation period 
(kg) (weeks) 

African elephant 6000 88 
Horse 400 48 
Grizzly bear 400 30 
Lion 200 17 
Wolf 34 9 
Badger 12 8 
Rabbit 2 45 
Squirrel 0.5 3.5 
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Table 8.2.2 
Species Gestation 
period (weeks) 

Indian elephant 89.0 
Camel 57.0 

Sea lion 51.4 

Dog 8.7 

Rat 3.0 
Hamster 2.3 


(a) Calculate the least-squares regression line for these data with adult mass as the 
independent variable. 

(b) Plot the points and the least-squares regression line on the same graph. 

(c) Calculate the least-squares regression line for these data with gestation period as the 
independent variable. 

(d) Assuming that the regression model of part (c) holds for all mammals, estimate the 
adult mass in kilograms for the mammals given in Table 8.2.2. 


8.3 INFERENCES ON THE LEAST-SQUARES ESTIMATORS 


Once we obtain the estimators of the slope 6; and intercept Bo of the model regression line, we 
are in a position to use Theorem 8.2.1 to make inferences regarding these model parameters. Using 
the properties of By and fj, in this section we study the confidence intervals and hypothesis tests 
concerning these parameters. 


From Theorem 8.2.1, we can write 


ze FE aca 6, 45, 


Vv Sux 


Also, it can be shown that SSE/o? is independent of ; and has a chi-square distribution with n — 2 
degrees of freedom. Let the mean square error be defined by 


SSE 1 


MSE = = 
n—-2 n— 


5 » [yi — (Bo + Bix)’. 
i=1 


Then using Definition 4.2.2, we have 
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which follows the t-distribution with n — 2 degrees of freedom. 


Similarly, let 


Bo — Bo ~ N(O, 1). 


antsy) 


Also, it can be shown that fp and SSE are independent. Hence, 


Zo = 


ta = 20 = Bo — Bo 
Bo SSE 


BE [wse(h+E)]” 


follows the t-distribution with n — 2 degrees of freedom. 


From these derivations, we can obtain the following procedure about the confidence intervals for the 
slopes 6; and for the intercept Bo. 


PROCEDURE FOR OBTAINING CONFIDENCE INTERVALS FOR fp AND £, 

1. Compute Sxx, Sxys Sxy,y, and x as in the procedure for fitting a least-squares line. 
2. Compute Bi Bo using equations Bi = (Sxy )/(Sxx) and Bo = =y- Bix, respectively. 
3. Compute SSE by SSE = Syy — BiSxy. 
4. Define MSE (mean square error) to be 

Msp Es 

—2 
where n = Number of pairs of observations (x1,¥1),..-1 (Xn, Yn)- 

5. A(1 — a)100% confidence interval for 6; is given by 


A MSE 4 MSE 
Bi — te/2,n—2,| <—, Bi + ta/2,n—2,| <— 
Sxx Sxx 


where tq/2 is the upper tail w/2-point based on a t-distribution with (n — 2) degrees of freedom. 
6. A(1 — a)100% confidence interval for Bo is given by 


2 Tee Toe ie 
Bo — ta/2,n—2 se Stee , Bo + te/2,n—2 | MSE Fa om : 
n Sxx Sxx 


We illustrate this procedure for obtaining confidence limits with an example. 
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ee 


Example 8.3.1 

For the data of Example 8.2.1: 
(a) Construct a 95% confidence interval for Bo and interpret. 
(b) Construct a 95% confidence interval for 6; and interpret. 


Solution 
The following calculations were obtained in Example 8.2.1: 


Syx = 263.6, Sry = 534.2, ¥ = 4.6 and ¥ = 3.8. 


Also, 


B1 = 2.0266, By = —3.1011. 


In addition to those calculations, we can compute 


2 
n 
2 “ (3 ») (46)? 
2 2 = 
Ds y} = 1302 and Syy a yt 30 197 = 1090 
i=1 i=1 
Now, 
SSE = Syy — Bi Szy 
= 1090.4 — (2.0266)(534.2) 
= 7.79028. 
Hence, 


SSE —_ 7.79028 
n-2 8 


MSE = = 0.973785. 


Now from the t-table, we have to.925,8 = 2.306. 
(a) A 95% confidence interval for Bo is given by 


1 w\T 1 w\T 
ES x a x 
Bo — tu /2,n—2 MSE\ —- + — ,Bo+ to /2,n—2 MSE\ -—-+ — 

n Sxx n Sxx 


9\7q1/2 
= | —3.1011 — (2.306) [osrares (3 a oo) | 


—3.1011 + (2.306) | (0.973785) = + 3.8)" a 
, , , 10 263.6 


From which we obtain a 95% confidence interval for Bg as (—3.9846, —2.2176). Thus, we can conclude 
with 95% confidence that the true value of the intercept, Bo, is between —3.9846 and —2.2176. 
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(b) A 95% confidence interval for By is given by 


n MSE » MSE 
Bi — te/2,n—2,| —<—» B1 + ta/2,n—2,] =— 
Sixx Sixx 
0.973785 0.973785 
= | 2.0266 — (2.306),/ —————,, 2.0266 + (2.306),/ ———_—— 
236.6 236.6 


from which we obtain a 95% confidence interval for By as (1.8864, 2.1668). Thus, we can conclude with 
95% confidence that the true value of the slope of the linear regression model is between 1.8864 and 2.1663. 


One of the assumptions for linear regression model that we have made is that the variance of the 
errors is a constant and independent of x. Errors with this property are called homoscedastic. If the 
variance of the errors is not constant, the errors are called heteroscedastic. In the heteroscedastic case, 
standard errors and confidence intervals based on the assumption that s* is an estimate of o? may be 
somewhat deceptive. 


Now we introduce hypothesis testing concerning the slope and intercept of the fitted least-squares 
line. We use tg, and tg, defined earlier as the test statistic for testing hypotheses concerning fo and 
£1, respectively. The usual one- and two-sided alternatives apply. We proceed to summarize these test 
procedures. 


HYPOTHESIS TEST FOR fo 


One-sided test 


Ho : Bo = Boo 
(Boo is a specific value of Bo) 


Ha : Bo > Boo or Bo < Boo 


Test statistic: 


Rejection region: 


t > ty, (n—2) (upper tail region) 


t < —ty, (n—2) (lower tail region) 


Two-sided test 


Ho : Bo = Boo 


Ha : Bo # Boo 


Test statistic: 


Rejection region: 


|t| > te/2,(n—2) 


Decision: If tg, falls in the rejection region, reject the null hypothesis at level of significance a. 
Assumptions: Assume that the errors ¢;, i = 1,...,n are independent and normally distributed with 


E (e;) =0,i =1,...,n,and Var(e;) = 02, i = 1,...,n. 


We now illustrate this procedure with the following example. 
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AHA SAA? 


Example 8.3.2 
Using the data given in Example 8.2.1, test the hypothesis Ho : Bg = —3 versus Hg : Bo 4 —3 using the 
0.05 level of significance. 


Solution 

We test Ho : Bo = —3 versus Ha : Bo # —3. 

Here Boo = —3. The rejection region is t < —2.306 or t > 2.306. 
From the calculations of the previous example, we have 


Bo — Boo 
fuse) 


—3.1011 — (—3) 


1B = 


[ (0.973785) (+ gar yr 


= —0.26041. 


Because the test statistic does not fall in the rejection region, at a = 0.05, we do not reject Ho. 


HYPOTHESIS TEST FOR £, 


One-sided test Two-sided test 
Ho : Bi = B10 (B10 is a specific value of 8) Ho: 61 = Bio 
Ha : Bi > Bio or Bi < Pio Ha : Bi # Bio 
Test statistic: Test statistic: 
i Bi — Bio — Bi — Bio 

MSE MSE 
Rejection region: Rejection region: 
t > ty(n—2) (upper tail region) It] > tw/2,(n—2) 


t < —ty,(n—2) (lower tail region) 


Decision: If tg, falls in the rejection region, reject the null hypothesis at confidence level a. 
Assumptions: Assume that the errors ¢;,/ = 1,...,n are independent and normally distributed with 
E (e;) =0,i =1,...,n,and Var (ej) = 02, i =1,...,n. 


The test of hypothesis Ho : 6; = O answers the question, Is the regression significant? If 6; = 0, we 
conclude that there is no significant linear relationship between X and Y, and hence, the independent 
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variable X is not important in predicting the values of Y ifthe relationship of Y and X is not linear. Note 
that if 8; = 0, then the model becomes y= fo + «. Thus, the question of the importance of the inde- 
pendent variable in the regression model translates into a narrower question of the test of hypothesis 
Ho : 6 = 0. That is, the regression line is actually a horizontal line through the intercept, Bo. 


L$ $$$ ———————— 
Example 8.3.3 
Using the data given in Example 8.2.1, test the hypothesis Hg : 6, = 2 versus Hg : 6; 4 2 using the 0.05 
level of significance. 


Solution 
We test 
Ho: By =2 vs. Ha: By AD, 


We know that Bi = 2.0266. 
For a = 0.05 and n = 10, the rejection region is t < —2.306 or t > 2.306. The test statistic is 


_ Bi - Bio 
‘81 = ~~ SE 
Sxx 


2.0266 — 2 
= ——- = 0.4376. 


/2.0266 — 2 
263.6 


Because the test statistic does not fall in the rejection region, at a = 0.05, we do not reject Hg. Thus, for 


a = 0.05, the given data support the null hypothesis that the true value of the slope, B1, of the regression 
line is equal to 2. 
[ies 
Another problem closely related to the problem of estimating the regression coefficients Bo and f} is 
that of estimating the mean of the distribution of Y for a given value of x, that is, estimating Bp + Bx. 
For a fixed value of x, say xo, we have the following confidence limits. 


A (1 — a)100% confidence interval for Bg + 61x is given by 


(xo — x)? 


Sxx 


x a 1 
(Bo -P Bix) += ty/2Se a + 


«. -. |S = Sw)" 
2 (n —2)Syx 


We could use the data from the previous example to easily calculate a confidence interval for Bp + 61x. 


where 
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8.3.1 Analysis of Variance (ANOVA) Approach to Regression 


Another approach to hypothesis testing is based on ANOVA. A detailed explanation of this approach 
is given in Chapter 10. Here we present necessary steps for regression. The main reason for this 
presentation is the fact that most of the major statistical software outputs for regression analysis (see 
Section 8.9) are given in the form of ANOVA tables. 


It can be verified that (see Exercise 8.3.7) 


Go -W? = (i - 51)? + Yo (51-5). 
i=1 i=1 i=1 


Denoting 


n n 


n 
SST =) i - 9)? SSE=}* (9: — 51)”, and SSR =) (31 -9)", 
j=l i=l i=l 


the foregoing equation can be written as 
SST = SSR+ SSE. 


Note that the total sum of squares (SST) is a measure of the variation of y;’s around the mean y, 
and SSE is the residual or error sum of squares that measures the lack of fit of the regression model. 
Hence, SSR (sum of squares of regression or model) measures the variation that can be explained by 
the regression model. 


We saw that to test the hypothesis Ho : 8} =0 vs. Hy : 8; #0, the statistic 


Sxx 


was used, where ¢g, follows a t-distribution with (n — 2) degrees of freedom. From Exercise 4.2.18, 
we know that 


32 
Pn By 
Bi ( MSE ) 
Sxx 
follows an F-distribution with numerator degrees of freedom 1 and denominator degrees of freedom 
(n — 2). We can also verify that 
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Table 8.1 ANOVA Table for Simple Regression 

Source of Degrees of Sum of squares Meansum of _ F-ratio 
variation freedom squares 

Regression 1 SSR MSR= ~ ae 
(model) F 

Error n—2 SSE 

(residuals) ih 

Total n—-1 SST 


Thus, to test Ho : Bi = O vs. Hg : Bi 4 0, we could use the statistic 


MSR 
<2" ~ F(1,n —2) 
MSE 

and reject Ho if 
MSR _ ig 5; 
Mee 


These can be summarized by Table 8.1, known as the ANOVA table. 


The last column in the ANOVA table gives the statistic (MSR)/(MSE). It is also customary to give 
another column with the p-value of the test. 


—!.el!:::?.e.0.::::.:.0.oaA.0.0.! -.65555 

Example 8.3.4 
In a study of baseline characteristics of 20 patients with foot ulcers, we want to see the relationship between 
the stage of ulcer (determined using the Yarkony-Kirk scale, a higher number indicating a more severe stage, 
with range 1 to 6), and duration of ulcer (in days). Suppose we have the data shown in Table 8.2. 

(a) Give an ANOVA table to test Ho : 8; =0 vs. Hg : 8, #0. What is the conclusion of the test based 

ona = 0.05? 
(b) Write down the expression for the least-squares line. 


Table 8.2 


Stage of Ulcer (x) 4 3 P) 4 4 3 3 4 6 3 


Duration (d) 18 6 20 15 16 15 10 18 26 15 


Stage of Ulcer (x) 3 4 3 2 3 2 2 3 5 6 


Duration (d) 8 16 17 6 7 7 8 11 21 24 
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Table 8.3 


Source of Degreesof Sumof Meansum F-ratio p-Value 
variation freedom squares of squares 


Regression 1 570.04 570.04 77.05 0.000 
(model) 

Error 18 133.16 740 

(residuals) 

Total 19 703.20 


Solution 
(a) We test Ho : Bj =0 vs. Hg : By #0. We will use Minitab to generate the ANOVA table (Table 8.3). 
Because the p-value is less than 0.001, for w= 0.05, we reject the null hypothesis that By =O and 
conclude that there is a relationship between the stage of ulcer and its duration. 
(b) Again, using the Minitab output, we get the least-squares line as 


d = 4.61x — 2.40. 


EXERCISES 8.3 


8.3.1. An experiment was conducted to observe the effect of an increase in temperature on the 
potency of an antibiotic. Three one ounce portions of the antibiotic were stored for equal 
lengths of time at each of the following Fahrenheit temperatures: 40°, 55°, 70°, and 90°. The 
potency readings observed at the end of the experimental period were 


Potency reading, y| 49 | 38] 27) 24 |38]33] 19 |28]16] 18 | 23 
Temperature, x 40° 55° 70° 90° 


(a) Find the least-squares line appropriate for these data. 
(b) Plot the points and graph the line as a check on your calculations. 
(c) Calculate the 95% confidence intervals for Bo and f1, respectively. 


8.3.2. Consider the data 


x|38)]26|48 | 22 | 40] 15) 30) 33 
y}10/ 11/16) 8/12] 5) 10) 11 
(a) Find the least-squares line appropriate for these data. 

(b) Plot the points and graph the line as a check on your calculations. 
(c) Calculate the 95% confidence intervals for Bp and £1, respectively. 


8.3.3. Showthat Y and A, are independent, under the usual assumptions ofa simple linear regression 
model. 


8.3.4. Using the data of Exercise 8.2.10, calculate the 95% confidence intervals for By and f;, 
respectively. 
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8.3.5. The following data represent survival time in days after a heart transplant and patient age in 
years at the time of transplant for 10 randomly selected patients. 


Age at transplant 28] 41 |46|53} 39 | 36 | 47 | 29 | 48 | 44 
Survival time, in days| 7 | 278 | 44 | 48 | 406 | 382] 1995) 176 | 323 | 1846 


(a) Find the least-squares line appropriate for these data. 
(b) Plot the points and graph the line. 
(c) Calculate the 95% confidence intervals for Bp and £1, respectively. 


8.3.6. The following data represent weights of cigarettes (g) from different manufacturers and their 
nicotine contents (mg). 


Weight 15.8 | 14.9 | 9.0 | 4.5 | 15.0 | 17.0] 8.6 | 12.0] 4.1 | 16.0 
Nicotine | 0.957 | 0.886 | 0.852 | 0.911 | 0.889 | 0.919 | 0.969 | 1.118 | 0.946 | 1.094 


(a) Find the least-squares line appropriate for these data. 
(b) Plot the points and graph the line. Do you think the linear regression is appropriate? 
(c) Calculate the 95% confidence intervals for Bp and £1, respectively. 


8.3.7. Show that 


8.4 PREDICTING A PARTICULAR VALUE OF Y 


In the earlier sections, we have seen how to fit a least-squares line for a given set of data. Also using 
this line, we could find E(Y), for any given value of x. Instead of obtaining this mean value, we may 
be interested in predicting the particular value of Y for a given x. In fact, one of the primary uses of 
the estimated regression line is to predict the response value of Y for a given value of x. Prediction 
problems are very important in several real-world problems; for example, in economics one may be 
interested in a particular gain associated with an investment. 


Let Yo denote a predictor of a particular value of Y = Yo and let the corresponding values of x be xo. 
We shall choose Yo to be E(Y|xo). Let Y denote a predictor of a particular value of Y. Then the error 
n of the predictor in comparison to a particular value of Y is 


n=Y—Yo. 
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Both Y and ¥ are normal random variables, and the error is a linear function of Y and Y. This means 
that 7 itself is normally distributed. Also, because E(Y) = E(Y), we have 


E(n) = E(¥|xo) — E(¥) = 0. 
Furthermore, 
Var(n) = Var(¥ — Y) = Var(¥) + Var(¥) — 2Cov(¥, ¥). 


We can consider Y and ¥ as independent, because we are predicting a different value of Y, not used 
in the calculation of Y. Therefore, Cov(Y, Y) = 0. In that case 


Var(n) = Var(Yo) + Var(Yo) 


Hence, the error of predicting a particular value of Y, given x, is normally distributed with mean zero 
2 
and variance [1 ti a] o. 


That is, 


and 


Fs ~ N(O, 1). 
>)\2 
s [1+ 3+ (a) | 


If we substitute the sample standard deviation S for o, then we can show that 


Y-Y 
T= 


sj[i+h+ S| 


follows the t-distribution with [n — (k + 1)] degrees of freedom. Using this fact, we now give a 
prediction interval for the random variable Y, the response of a given situation. 


We know that 


P (-tej2 <T< ty/2) =1l1-a. 
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Substituting for T, we have 


which implies that 


- 1 _—x)2 m 1 _ x2 
P| Y —tyjoS joo <¥<¥+ig2S |}1+ fp EN Ny, 
n Syex n Sxx 


Hence, we have the following. 


A (1 — «)100% prediction interval for Y is 


Sxx 


s\e 
¥ + ty/25 cua 


SSE 


where tw/2 is based on (n — 2) degrees of freedom and 52 = ey 


We illustrate this statistical procedure with the following example. 


Example 8.4.1 
Using the data given in Example 8.2.1, obtain a 95% prediction interval at x = 5. 


Solution 
We have shown that Vy = —3.1011 + 2.0266x. Hence, atx =5, jy = 7.0319. 


Also X = 3.8, Sxx = 263.6, SSE = 7.79028, and S = ,/ 79°28 — 2.306. 


From the t-table, to.925,8 = 2.306. 
Thus, we have 


2 
7.0319 + 2.306)(0.98681) [1 +4+ 552 | 


which gives the 95% prediction interval as (4.6393, 9.4245). 


We can conclude with 95% confidence that the true value of Y at the point x = 5 will be somewhere between 
4.6393 and 9.4245. 
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EXERCISES 8.4 


8.4.1. The following are midterm and final examination test scores for 10 calculus students, where 
x denotes the midterm score and y denotes the final score for each student. 


x | 68) 87 | 75) 91 | 82) 77 | 86 | 82) 75 | 79 
y | 74 | 89 | 80 | 93 | 88 | 79 | 97 | 95 | 89 | 92 


Obtain a 95% prediction interval for x = 92 and interpret its meaning. 


8.4.2. The following data give the annual incomes (in thousands of dollars) and amounts (in 
thousands of dollars) of life insurance policies for eight persons. 


Annual income | 42 | 58 | 27 | 36) 70 | 24} 53 | 37 
Life insurance 150 | 175 | 25 | 75 | 250 | 50 | 250 | 100 


Obtain a 90% prediction interval for x = 59 and interpret its meaning. 


8.4.3. For the following data, construct a 95% prediction interval for x = 12. 


1/}/3);5),7/9) 11 
16 | 36 | 43 | 65 | 80 | 88 


8.4.4. The data given below are from a random sample of height (in inches) and weight (in pounds) 
of seven basketball players. 


Height | 73 | 83 | 77 | 80 | 85 | 71 80 
Weight | 186 | 234 | 208 | 237 | 265 | 190 | 220 


Construct a 99% prediction interval for height equal to 90. Interpret the result and state any 
assumptions. 


8.4.5. For the data in Exercise 8.2.10, obtain a 95% prediction interval for the age, x = 85, interpret 
and state any assumptions. 


8.5 CORRELATION ANALYSIS 


Using the regression model, we can evaluate the magnitude of change in the dependent variable due 
to certain changes in the independent variables. One of the main assumptions we have used is that 
the independent variables are known. However, there are problems where the x-values as well as 
the y-values are assumed by random variables. This would be the case, for example, if we study the 
relationship between secondhand smoking and the incidence of a certain disease. Here, basically, one 
treats X as random, and hence the simple linear regression model is 


Y=fo+hixX+e 
This implies that 


E(Y |X =x) = Bo + Bix 
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and one looks for dependence of X and Y. Once we have determined that there is a relationship 
between the variables, the next question that arises is how closely the variables are associated. 
A measure of the amount of linear dependency of two random variables is the correlation. The 
correlation coefficient tells us how strongly two variables are linearly related. The statistical method 
used to measure the degree of correlation is referred to as correlation analysis. We will assume that 
the vector random variable (X, Y) has a bivariate normal distribution. In this case, it can be shown 
that 


EY |X =x) = Bo + Bix. 


At times, our interest may not be in the linear relationship; rather, we may merely want to know 
whether X and Y are independent random variables. If (X, Y) has a bivariate normal distribution, 
then testing for independence is equivalent to testing that the correlation coefficient, p = oxy/(oxoy), 
is equal to zero. Note that p is positive if X and Y increase together and p is negative if Y decreases as X 
increases. If o = 0, there is no relation between X and Y; if o > 0, there is a positive relation between 
X and Y (increasing slope); and when p <0, we have a negative relationship (decreasing slope). 
Thus, the correlation coefficient can be used to measure how well the linear regression model fits 
the data. 


Let (Xj, Y1), (X2, Y2),.--, (Xn, Yn) be a random sample from a bivariate normal distribution. The 
maximum likelihood estimator of p is the sample correlation coefficient defined by f or r, 


r= (8.7) 


=f 
V SxxSyy 
Equivalently, we can rewrite (8.7) by 
n n 


n 
n > XiY;j — > Xj Yj 
i=1 i 1 


i=1 i= 


ny n ? ii n 7 
ny XF— yd xX; ny Yf— YY; 
i=1 i=1 =1 i=1 


We can see that —1 < r < 1. The value of r could readily be obtained by the calculations one already 
has performed for the regression analysis. Observe that the numerator of r is exactly the same as the 
numerator of f; derived in Section 8.2. Because the denominators of both f; and r are nonnegative, 
they have the same sign. It can be shown that this estimator is not unbiased. If the value of r is near 
or equal to zero, this implies little or no linear relationship between x and y. On the other hand, the 
closer r is to 1 or —1, the stronger the linear relationship between x and y. When r > 0, values of 
y increase as the values of x increase, and the data set is said to be positively correlated. When r < 0, 
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values of y decrease as the values of x increase, and the data set is said to be negatively correlated. In this 
book, we use the term correlation only when referring to linear relationships. In actual practice we can 
use the value of r to decide whether it is appropriate to develop linear regression models in a given 
situation. As a rule of thumb, ifr > 0.30 orr < —0.30, we proceed with developing a linear regression 
model. However, a much higher or lower value is desirable. For example, if in a given problem where 
r = 0.77, it conveys to us that approximately 77% of the data we have are linearly related. 


The probability distribution for r is difficult to obtain. For large samples, this difficulty could be 
overcome by using the fact that the Fisher z-transform, given by 


z= (1/2) Infd+r)/A —n)] 
is approximately normally distributed with mean p,;=(1/2)In[(1 + p)(1 —)] and variance 


0; =1/(n — 3). Thus, for large random samples, we can test hypotheses about p using the approximate 
test statistic: 


ya 9 
Oz 


_ a/2yn (45) ~ /2) (75) 


i 
J/n-3 


For example, suppose we are interested in testing the hypothesis that the true value of p is a specific 
number, say, 0, with a certain value of a. We can proceed to make a decision by following the 
procedure given next. 


HYPOTHESIS TEST FOR p 


One-sided test Two-sided test 

Ho: p= po Ho: p= po 

Ha: p > poor Ha: p # po 

Ha: p < po 

Test statistic: Test statistic: 

7 n( #27) -(¥8) 7 oP in( ter) (28) 
Vn =3 Vn =3 

Rejection region: Rejection region: 

Z > Zq (upper tail region) Z| > Za/2 


Z < —Zg (lower tail region) 


Decision: If Z falls in the rejection region, reject the null hypothesis at confidence level a. 
Assumption: (X,Y ) follow the bivariate normal, and this test procedure is approximate. 
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I 


Example 8.5.1 
For the data given in Example 8.2.1, would you say that the variables X and Y are independent? Use 
a = 0.05. 


Solution 
We test 


Ho: p=O0vs. Ha: p #0. 


From Example 8.2.1, we have the following summary: 


n n n 
> xi = 38; D> = 46; D> xiy; = 709 
i=1 i=] i=1 
and 
n n 
Se 408; Dy? = 1302; » = 10. 
i=1 i=1 


Hence, 


(10)(709) — (38)(46) 
FT [ (10) (408) — (38)?] [(10)(1302) — (46)?] 


= 0.99641. 


The test statistic is 


(1/2) In (442) — (1/2) (722) 


Ye T 
n—-3 
140.99641 1+0 
_ (1/2) In (49:59¢4r) (1/2) (73) 
~ A 
V7 
= 8.3618. 


For Zu/2 = 20.025 = 1.96, the rejection region is |z| > 1.96. Because the observed value of the test statistic 
falls in the rejection region, we reject the null hypothesis and conclude that at a = 0.05, the variables X 


and Y are dependent. 
= 


444 CHAPTERS Linear Regression Models 


EXERCISES 8.5 


8.5.1. 


8.5.2. 


8.5.3. 


8.5.4. 


8.5.5. 


The table shows the midterm and final examination test scores for 10 students from a 
differential equations class, where x denotes the midterm scores and y denotes the final 
scores. 


x | 68) 87 | 75) 91 |) 82) 77 | 86 | 82) 75 | 79 
y | 74 | 89 | 80 | 93 | 88 | 79 | 97 | 95 | 89 | 92 


(a) At 95% confidence level, test whether X and Y are independent. 
(b) Find the p-value. 
(c) State any assumptions you have made in solving the problem. 


The following table gives the annual incomes (in thousands of dollars) and amounts (in 
thousands of dollars) of life insurance policies for eight persons. 


Annual income | 42 | 58 | 27 | 36| 70 | 24| 53 | 37 
Life insurance 150 | 175 | 25 | 75 | 250 | 50 | 250 | 100 


(a) Atthe 98% confidence level, test whether annual income and the amount of life insurance 
policies are independent. 

(b) Find the attained significance level. 

(c) State any assumptions you have made in solving the problem. 


Show that 


n n 


Xi¥j;-— Xi DY; 
i=1 


i=1 i= 


2 2 
Sa-(£x) »Sn-(8 ‘) 
i=1 i=1 i=1 i=1 


is not an unbiased estimator of the population coefficient, p. 


n 


TMs 


L 


Using the data in Example 8.2.1: 

(a) Compute r, the coefficient of correlation. 

(b) Would you say that the variables X and Y are independent? Use a = 0.05. 
(c) State any assumptions you have made in solving the problem. 


A new drug is tested for serum cholesterol-lowering properties on six randomly selected 
volunteers. The serum cholesterol values are given in the following table. 


Before treatment: | 232 | 254 | 220 | 200 | 213 | 222 
After treatment: 212 | 240 | 225 | 205 | 204 | 218 


(a) At 95% confidence level, test whether X and Y are independent. 
(b) Find the p-value. 

(c) Calculate the least-squares regression line for these data. 

(d) Interpret the usefulness of the model. 

(e) State any assumptions you have made in solving the problem. 
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8.6 MATRIX NOTATION FOR LINEAR REGRESSION 


Most real-life applications of regression analysis use models that are more complex than the simple 
straight-line model. For example, a person's body weight may depend not just on the person’s eating 
habits; it may depend on additional factors such as heredity, exercise, and type of work. Hence, we 
may want to incorporate other potential independent variables in the modeling. We now study the 
situation where k(> 1) independent variables are used to predict the dependent variable. The model 
to be studied is of the form 


Y = Bot Bix. + Bix2 +--+ + Bex te. 


Here, e ~ N (0, 07). This model is called a multiple regression model. 


Let y1, y2,---, Y, ben independent observations on Y. Then each observation y; can be written as 
Yi = Bo + Bixit + Baxi2 +--+ + Bexik + € 


where x;; is the jth independent variable for the ith observation, i = 1, 2,...,, and ¢;s are indepen- 
dent as in the simple linear regression case. It is sometimes advantageous to introduce matrices to 
study the linear equations. Let x9 = 1. Define the following matrices: 


Xo X11 X12, + + Xk y1 
XO 21 X22 «+ + XDK y2 
X= Y= 

XO Xnl %n2 - + Xnk Yn 
Bo zl 
By &2 

p=| - and e=| ~ |. (8.8) 
By En 


Thus the n equations representing the linear equations can be rewritten in the matrix form as 
Y=XBr+e. 
In particular, for the n observations from the simple linear model of the form 
Y=fot Pixte 


we can write 


Y=XBte, 
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where 
Y1 1 x1 é] 
y2 1 x2 €2 
1 
Y= , X= , €= , and B= Po 
1 Bi 
1 ‘ 
Yn 1 Xn En 
We can see that 
1 XY 
1 n 
x2 n bees 
vx=| i a ‘| =o ] . 
xX] X2 Xn 
Le Ls 
7 i=1 i=1 
1 Xn 


where ’ denotes the transpose of a matrix. 


Also, 


XY= 


Let us now go back to the multiple regression model 


Y = Bo + Bix + Bix2 +--+ + Byxe te. 


The least-squares estimators Bi of B; fori = 0,1,2,...,k are the ones that minimize the sum of 


squares 
n n 5 a ‘: n 2 
SSE = oe = >, [>i = (Bo + Bixy + Bax2+---+ Px) | 
i=1 i=1 
= (y — XB)'(y — XB) 
= y'y — y'XB — (XB)'y + (BX) XB. 


To minimize SSE with respect to 6, we differentiate SSE with respect to 8 and equate it to zero. Thus, 


3 
(y’y — y'X'B — p'X’y + X’p' XB) =0 


op 


yielding 
(X’X)B = X’Y. 
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Assuming the matrix (X’X) is invertible, we obtain 
B= (X'X7YY’Y. 


Now we summarize the procedure to obtain a multiple linear regression equation. 


PROCEDURE TO OBTAIN A MULTIPLE LINEAR REGRESSION EQUATION 
1. Rewrite the n observations 


¥; = Bo + Bixa; + Bixaj +--+ Bexgi, i = 1,2,...,n 


in the matrix notation as 


Y=XB+e 


where X, Y, and £ are defined in (1). 
2. Compute (X’X)~! and obtain the estimators of 8 as 


B= (X'X)'X’Y. 


3. Then the regression equation is 


Example 8.6.1 
Using the data given in Example 8.2.1, use the matrix approach to solve the problem of operations. 


Solution 
From the data of Example 8.2.1 we have 


-9 1 -3 

—7 1 -2 

—5 1 -1 

—4 1 0 

Y= - and X = i = 
6 1 5 

9 1 6 

13 1 8 

21 1 11 

20 1 12 


Thus, we can write 


1 4 1548 —0.0144 
xx=| 7% | yy] * (Xx) t= ptoee 
38 408 709 0.0144 0.0038 
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Hence, 


—0.0144 0.0038 | | 709 


_ | -3.1009] _ | Bo 
“1 2.0266] | pi |° 


Thus, the least-squares line is given by 


p= ox’ 0.1548 erie 


y = —3.1009 + 2.0266X, 


which is identical to the regression line we obtained in Example 8.2.1. 
= 


| —__—c——,ERR 
Example 8.6.2 
The following data relate to the prices (Y ) of five randomly chosen houses in a certain neighborhood, the 
corresponding ages of the houses (x1), and square footage (x2). 


Price yinthousands Age x; in Square footage x2 in thousands 
of dollars years of square feet 
100 1 1 
80 5 1 
104 5 2 
94 10 2 
130 20 3 


Fit a multiple linear regression model 


Y = Bo + Bix1 + Box2 +€ 


to the foregoing data. 


Solution 
We have 
100 111 
80 15 1 5 41 9 
Y=/104]; ¥=/]1 5 2]; XX=]41 551 96]; 
94 1 0 2 9 96 19 
130 1 20 3 


| 508 
x'y = | 4560 


966 
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and 
2.3076 0.1565 —1.8840 
(X’xX)-'=] 0.1565 0.0258 —0.2044 
—1.8840 —0.2044 1.9779 
Hence, 
66.1252 
(X'X)71 (XY) = | —0.3794 
21.4365 


Thus, the regression model is 


y = 66.12 — 0.3794x, + 21.4365x9. 


8.6.1 ANOVA for Multiple Regression 
As in Section 8.3, we can obtain an ANOVA table for multilinear regression (with k independent or 
explanatory variables) to test the hypothesis 


Ho: Bi = 62 =---= fe =0 
versus 
Hg : At least one of the parameters 8; #0, j = 1,...,k. 
The calculations for multiple regression are almost identical to those for simple linear regression, 
except that the test statistic (MSR)/(MSE) has an F (k,n — k — 1) distribution. Note that the F-test 


does not indicate which of the parameters 6; 4 0, except to say that at least one of them is not zero. 
The ANOVA table for multiple regression is given by Table 8.4. 


Table 8.4 ANOVA Table for Multiple Regression 


Source of Degreesof Sumof Meansumof_ F-ratio 


variation freedom squares squares 
R MSR 
Regression k SSR MSR = “* - 
(Model) f 
E 
Error n—k—-1 SSE ae 
(Residuals) f 


Total n—-1 SST 
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Example 8.6.3 
For the data of Example 8.6.2, obtain an ANOVA table and test the hypothesis 


Ho: £1 = B2 = Ovs. Hg : at least one of the 6; 4 0, i = 1, 2. 
Use a = 0.05. 


Solution 
We test Ho : By = B2 = O vs. Hq: At least one of the Bj # 0,i = 1,2. Heren = 5, k = 2. Using Minitab, 
we obtain the ANOVA table (Table 8.5). Based on the p-value, we cannot reject the null hypothesis at 


a = 0.05. 


Table 8.5 


Source of Degrees of Sum of Mean sum of F-ratio p-Value 
variation freedom squares squares 


Regression 2 956.5 478.2 2.50 0.286 
(Model) 

Error 2 382.7 191.4 

(Residuals) 

Total 4 1339.2 


EXERCISES 8.6 
8.6.1. Given the data 


PwWwhn Ww 
Ana w BS 


(a) Write the multiple regression model in matrix form. 
(b) Find X’X, (X’X)7!, and X‘y. 
(c) Estimate B. 
(d) Estimate the error variance. 
8.6.2. A study is conducted to estimate the demand for housing (y) based on current interest rate 
X, and the rate of unemployment. The data in Table 8.6.1 are obtained. 
(a) Fit the multiple regression model 


y = Bo + 1x1 + Bix2 +6. 
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Table 8.6.1 
Units sold Interest rate (%) Unemployment rate (%) 
65 9.0 10.0 
59 9.3 8.0 
80 8.9 8.2 
90 9.1 77 
100 9.0 7.1 
105 8.7 7.2 


(b) Test whether the model is significant. 


8.6.3. The following data give the annual incomes (in thousands of dollars) and amounts (in 
thousands of dollars) of life insurance policies for eight persons. 


Annual income | 42 | 58 | 27] 36] 70 | 24] 53 | 37 
Life insurance | 150 | 175 | 25 | 75 | 250 | 50 | 250 | 100 


Calculate the least-squares regression line for these data using matrix operations. 


8.6.4. The following is a random sample of height (in inches) and weight (in pounds) of seven 
basketball players. 


Height | 73 | 83 | 77 | 80 | 85 | 71 | 80 
Weight | 186 | 234 | 208 | 237 | 265 | 190 | 220 


Calculate the least-squares regression line for these data using matrix operations. 


8.7 REGRESSION DIAGNOSTICS 


In the previous sections, we derived least-squares estimators for the parameters in the linear regression 
model. These estimators are useful as long as we can determine (1) how well the model fits the 
data and (2) how good our estimates are in providing possible relationships between variables of 
interest. Some of these problems are discussed in Chapter 14 in a unified manner. We now briefly 
discuss some aspects of the adequacy of the simple linear regression model. In multiple regression, 
in addition to the problems discussed here, there are other problems, such as collinearity and model 
specification (inclusion of all relevant variables, as well as exclusion of irrelevant variables), that need 
to be examined. They are beyond the level of this text. Many graphical methods and numerical tests 
dealing with these problems are available in the literature and are often called regression diagnostics. 
Most of the major statistical software packages incorporate these tests, making it easier to perform 
regression diagnostics so as to detect potential problems. 


We have seen that the (ordinary) least-squares regression model must meet the following assumptions. 
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1. Linearity. The existence ofa linear relationship between x and y is the basis of the simple linear 
regression model. A simple method to test for linearity is to draw a scatterplot of data points. 
As we explained in Section 8.2, we could also plot residual e; versus x; or ¥;. A symmetric trend 
in the plot of the residuals versus the explanatory variable or the fitted values indicates there 
is a problem with the obtained regression model. For a correct model, the residuals should 
center around zero across the explanatory variables and the fitted values. The degree of linear 
relationship can be ascertained by the correlation coefficient, r, given in Section 8.5 or by 
using the value of the coefficient of determination r*, explained in Project 8B. Most statistical 
software packages give the value of r? (refer to outputs given in Section 8.9). The closer the 
value of r? is to 1, the better the least-squares equation } = B1x + Bo performs as a predictor 
of y. 


2. Homoscedasticity (homogeneity of variance). This assumption says that the variance of the 

error term remains constant across all values of x. In this case we know by the Gauss—Markov 
theorem that the least-squares estimators Bp and fj are the best linear unbiased estimators of 
Bo and f,. A frequently used graphical method is to draw the residuals versus a fitted plot. 
This can be easily done using statistical software packages. The graph of residuals e; versus 
fitted values Y; or explanatory variable x; indicates a change in the spread of residuals as Y or 
x changes. It may look like Figure 8.7. 
If the variances of y; values are not constant, the inferences we made, such as confidence 
intervals on means, prediction, and so forth, are off. The severity of this discrepancy depends 
on the degree of the assumption violation. If we see that the pattern of data points only changes 
slightly, that will indicate a mild heteroscedasticity. Two numerical tests for heteroscedasticity 
are explained in Section 14.4.3. 


Residuals versus the fitted values 
(response is C2) 
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W@ FIGURE 8.7 Scatterplot of fitted values versus residuals. 
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3. Independence of e; and ¢;, for i # j. This assumption specifies that the errors associated 
with one observation should not be correlated with the errors of any other observation. In 
general, whether the two samples are independent of each other is decided by the structure of 
the experiment from which they arise. Violation of the independence assumption can occur 
in a variety of situations. For example, if we take a survey on a certain issue on children’s edu- 
cation from one particular school, these observations may reflect some pattern, thus violating 
the independence assumption. If data are collected on the same variable over time, then the 
assumption of independence will be violated. Project 12B explains a run test for check of this 
assumption. Also, see Section 14.4.4. 


4. Normality of the errors. This assumption specifies that the distribution of the ¢; values should 
be normal. This assumption is crucial when sample size is small if the p-value for the test is 
to be valid. For large samples, by the Central Limit Theorem this assumption becomes less 
important unless the prediction of a single value of y is involved. Thus a test of normality is 
necessary mainly when the t-test is used. Section 14.4.1 explains some of the tests for normality. 
A simple way is to draw a probability plot for the errors to conform to the assumption of 
normality. If we observe nonnormality, one of the ways to overcome the problem is to use 
data transformation such as logarithmic transformation, as explained in Section 14.4.2, and 
perform the regression analysis on the transformed data. Sometimes nonparametric methods 
may be more appropriate, but we will not deal with this topic in this book. 


Another important issue is the existence of influential observations, individual observations that have 
a strong influence on estimated coefficients. If a single observation substantially changes our results, 
we need to do further investigation. The ordinary least-squares method is quite sensitive for out- 
lying observations, both for independent variables and for dependent variables, and can have an 
adverse effect on the estimate. In higher dimensional data, these outlying observations can remain 
unnoticed. This aspect in one explanatory variable case is discussed in Project 8C. One of the sim- 
ple ways to identify such observations is to draw a scatterplot. In the scatterplot, if we see a data 
point that is farther away from the rest of the data points, that is an indication of possible influential 
points. 


The natural question is, if we find that the data violate one or more of the assumptions, what can we 
do about it? We have already explained that violation of the normality assumption in large samples is 
not an issue unless prediction is involved, because prediction depends on normality of an individual 
observation. Thus, if the inferences are based on the f- or F-tests or prediction is involved, we may be 
able to transform Y to Y’ to achieve normality. If we have predicted Y’, then back-transform to predict 
Y. If we observe nonlinearity of data, we may be able to transform x to x’ = h(x) such that Y is linear 
in x’, or consider a polynomial model in x, in which case the ideas of multiple linear regression may 
be utilized. Robust estimates of variances of Bo and £ or the method of weighted least squares may 
be used to deal with the case of nonconstant variance. Often careful experimental design could be 
done to remove possible correlation in errors. There are also robust methods available for correlation 
analysis. We refer to specialized books on regression methods for further details on these issues. If 
we detect influential observations, there are statistical techniques available, such as least trimmed 
squares estimators, to deal with outlying observations. 
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8.8 CHAPTER SUMMARY 


In this chapter, we first derived the least-squares line and its properties. Then we learned about the 
confidence intervals for the coefficients in the regression model and did hypothesis tests on the values 
of the coefficients. We introduced the matrix notation for linear regression as well as for multiple 
regression. We discussed how to predict a particular value of Y for a given value of X. In order to 
study the dependence of X and Y, we presented correlation analysis. 


The following are some of the key definitions we have used in this chapter. 


m Predictors 

m Response variable 

m Regression analysis 

= Multiple linear regression model 
m Simple linear regression model 
m= Sum of squares for errors (SSE) 
= Sum of squares of the residuals 
a Least-squares line 

= Least-squares equations 

a Normal equations 

= Best linear unbiased estimator (BLUE) 


= Correlation analysis 
The following important concepts and procedures were discussed in this chapter: 


= Procedure for regression modeling 

= Procedure for fitting a least-squares line 

u Properties of the least-squares estimators for the model Y = Bp + Bix +e 
m The Gauss—Markov theorem 

= Procedure for obtaining confidence intervals of By and 6; 
m Procedure to obtain a multiple linear regression equation 
= Prediction interval for the response variable Y 

a Hypothesis testing for correlation, 

= Linearity 

= Homoscedasticity 

m Independence of g; and ¢;, fori 4 j 

a Normality of the errors 


a Influential observations 
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8.9 COMPUTER EXAMPLES 
8.9.1 Minitab Examples 


oo 


Example 8.9.1 
For the data in Example 8.2.1, use the method of least squares to fit a straight line to the accompanying 
data points. Give the estimates of 89 and £1. Plot the points and sketch the fitted least-squares line. 


Solution 
Enter independent variable, x, in C1 and the response variable, y, in C2. Then: 


Stat > Regression > Regression. .. > in Response: type C2, and in Predictors: type C1 > click OK 
We obtain the following output. 
Regression Analysis 


The regression equation is 
C2 = -3.10 + 2.03 Cl 


Predictor Coef StDev T P 
Constant -3.1009 0.3888 -7.98 0.000 

Cl 2.02656 0.06087 33.29 0.000 

S = 0.9883 R-Sq = 99.3% R-Sq(adj) = 99.2% 
Analysis of Variance 

Source DF SS MS F P 
Regression a 1082.6 1082.6 1108.34 0.000 
Residual Error 8 7.8 1.0 

Total 9 1090.4 


Unusual Observations 


Obs Cl C2 Fit StDev Fit 
8 11.0 21.000 19.191 0.538 

Residual St Resid 

1.809 2.18R 


R denotes an observation with a large standardized 
residual 


From this the estimate of Bp is —3.1009, and the estimate of 6; is 2.02656. Hence, the regression line 
is ) = —3.1009 + 2.02656x. Now to obtain the fitted regression line, use the following procedure: 


Stat > Regression > Fitted Line Plot... > in Response(Y): type C2, and in Predictors(X): type C1 > 
click Linear OK 
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We obtain the following graph. 


Regression plot 
Y = —3.1009 + 2.02656x 
R-Sq = 99.3% 


If in addition, we need, say, 95% confidence and predictor bands, then use 


Stat > Regression > Fitted Line Plot... > in Response(Y): type C2, and in Predictor(X): type C1 > click 
Linear > click options. .. > click Display confidence bands and Display predictor bands > in Title: 
type a title for the graph and OK > OK 


We obtain the following graph. 


Regression line with 95% confidence 
and predictor bands 
Y = —3.1009 + 2.02656 x 
R-Sq = 99.3% 


— Regression 
----95%C 
—:—:-95%R 


0 5 10 
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8.9.2 SPSS Examples 


A detailed explanation of regression methods including diagnostics using SPSS can be obtained at the 
site: http://www.ats.ucla.edu/stat/spss/webbooks/reg/. We will just demonstrate a simple case with 
an example. 


EEE O_o 
Example 8.9.2 
The following is a random sample of height (in inches) and weight (in pounds) of seven basketball players. 


Height 73 83 77 80 85 71. 80 
Weight 186 234 208 237 265 190 220 


Calculate the least-squares regression line for these data using SPSS. 


Solution 
Enter height in column 1 and weight in column 2. Then 


Analyze > Regression > Linear... > move var00002 to dependent:, and var000071 to 
Independent(s): > click OK 


We obtain the following output: 


Regression: Variables Entered/Removed 


Model | Variables Entered | Variables Removed | Method 
1 VAROOO001 : Enter 


a All requested variables entered. 
b Dependent Variable: VAROOO02 


Model Summary: 


Model | R_ | R Square | Adjusted R Square | Std. Error of the Estimate 


1 947 897 876 9.86006 
a Predictors: (Constant), VAROOO01 
ANOVA: 
Model Sum of Squares | df | Mean Square F Sig. 
1 Regression 4223.896 i 4223.896 43.446 | .001 
Residual 486.104 iS) 97.221 
Total 4710.000 6 


a Predictors: (Constant), VAROOOO1 
b Dependent Variable: VAROOO02 
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Coefficients: 


Unstandardized Standardized t Sig. 
Coefficients Coefficients 
Model B Std. Error | Beta 
1 (Constant) | —188.476 62.083 —3.036 | .029 
VAROOOO1 | 5.208 790 947 6.591 | .001 


a Dependent Variable: VAROO002 


Looking at the coefficients, we see that Bo = —188.476 and B, = 5.208. Hence, the regression line is given 
by } = —188.476 + 5.208x. Because the coefficient of determination r2 is 0.897, and the p-value is small, 
the model fit looks pretty good. 

= 


8.9.3 SAS Examples 


For regression analysis, we can use the SAS procs called GLM, which stands for General Linear Model, 
and REG, which stands for regression. In the following example we will give a simplified version of 
the foregoing procedure. A good explanation of regression methods including diagnostics using SAS 
can be obtained at http://www.ats.ucla.edu/stat/sas/webbooks/reg/. 


a 


Example 8.9.3 
Using the SAS commands, redo Example 8.9.1. 


Solution 
We can use the following commands. 


options nodate nonumber; 
data exreg; 

INPUT x y @@; 
datalines; 

il 6) 

0 -4 

2 2p 

=2 =] 


procreg data=exreg; 

title ‘Regression of Y on X’; 
model y=x / p clm; 
run; 


We obtain the following output. 


8.9 Computer Examples 459 


Regression of Y on X 


The REG Procedure 
Model: MODEL1 
Dependent Variable: y 
Analysis of Variance 


Sum of Mean 
Source DF Squares Square F Value Pr > F 
Model 1 1082.58589 1082.58589 1108.34 <.0001 
Error 8 7.81411 0.97676 
Corrected Total 9 1090.40000 
Root MSE 0.98831 R-Square 0.9928 
Dependent Mean 4.60000 Adj R-Sq 0.9919 
Coeff Var 21.48508 
Parameter Estimates 
Parameter Standard 
Variable DF Estimate Error t Value Pr > |t| 
Intercept 1 -3.10091 0.38882 -7.98 <.0001 
Xx 1 2.02656 0.06087 33.29 <.0001 


Regression of Y on X 


The REG Procedure 
Model: MODEL1 
Dependent Variable: y 


460 CHAPTER8 Linear Regression Models 


Output Statistics 


Dep Var Predicted Std Error 


Obs y Value Mean Predict 95% CL Mean Residual 
il -5.0000 -5.1275 0.4278 -6.1141 -4,1409 0.2275 
2  -4.0000 -3.1009 0.3888 -3.9975 -2.2043 0.8991 
3 2.0000 0.9522 0.3312 0.1885 1.7159 1.0478 
4  -7.0000 -7.1540 0.4715 -8.2413 -6.0667 0.1540 
5 6.0000 7.0319 0.3210 6.2917 7.7720 =1 0319 
6 9.0000 9.0584 0.3400 8.2743 9.8425 -0.0584 
7 13.0000 13.1115 0.4038 12.1804 14.0427 “0.1115 
8 21.0000 19.1912 0.5383 17.9499 20.4325 1.8088 
9 20.0000 21.2178 0.5889 19.8597 22.5758 -1.2178 
10 -9.0000 -9.1806 0.5187 -10.3766 -7.9845 0.1806 
Sum of Residuals 0 
Sum of Squared Residuals 7.81411 


Predicted Residual SS (PRESS) 14.18340 


By looking at the parameter estimates in the foregoing output, we see that an intercept value of 
—3.10091 is the estimate of Bo, and the estimate of 6; is 2.02656, corresponding to the variable x. 
For each value of x, the actual value and predicted value of y are given as the output statistics. 


It is important to note that the presentation of results of analysis in a simple way is as important as the 
analysis itself. For example, if one is interested only in a simple linear regression, most of the output 
values in the foregoing output may not be necessary. All the values until the parameter estimates are 
giving us the analysis of variance results, and all the values in the REG procedure are dealing with 
prediction and confidence intervals. For clarity and simplicity of report, we may only need to report 
the regression line, and perhaps the graph of the line. 


If we need the plot of the points (x, y), add the following commands to the previous program. We 
will not give the corresponding graph. 


proc plot data=exreg; 
title ‘Plot of Y Vs. X’; 
plot y*x; 

run; 


If we need the graph of the regression line along with, say, 95% prediction and confidence intervals, 
we add the following. 
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proc gplot data=exreg; 


Blow We 

y*x 

y*x / overlay frame vaxis =axisl haxis=axis2; 
symboll v =- h=1.5 i=none c=black; 


symbol2 v=none i=rlclm95 c=red; 

symbol3 v=none i=rlcli95 c=blue; 

axisl order = (-5 to 14 by 1) 

offset = (1) 

label = (h =1.5 f=duplex); 

axis2 order = (-10 to 20 by 1) 

offset = (1) 

label = (h =1.5 f = duplex); 

title h=1.5 

“ETTFECE OF X Om Vos 

title? h=i1.2 7 = cwolex 

*Common regression line with 95% confidence 
interval’; 


title3 h=1.5 f = duplex 
"Regression line is predicted Y=-3.1011 

#2 OZOH)” 2 
run; 


PROJECTS FOR CHAPTER 8 
8A. Checking the Adequacy of the Model by Scatterplots 


If the regression model is adequate, then the fitted equation can be used to make inferences. Otherwise, 
the inferences made will be practically useless. Note that the residuals give all the information on 
lack of fit. Figures 8.5 and 8.6 give an indication of good fit and misfit. 


(i) Collect a couple of real-life data and find a regression line for each. 
(ii) Draw the scatterplot for the residuals e; versus x and determine whether the regression lines 
obtained in (i) are a good fit or not. 


8B. The Coefficient of Determination 


One of the ways to measure the contribution of x in predicting y is to consider how much the 
prediction errors were reduced by using the information provided by the variable x. The quantity 
called the coefficient of determination measures how well the least-squares equation } = p,x + Bo 
performs as a predictor of y. If x contributes no information for predicting y, then the best prediction 
for values of y is simply the sample mean ¥. The resulting sum of squares of deviation for this model 
} = yis Syy = )_, (yi — y)?. In the case where x contributes information for predicting y, then we 
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have seen that the sum of squares of deviation for the model } = Bix + Bo is Syy = 37%, (i — 3i)?. 
It can be shown that )7y_, (vi — 31)? < DL Oi — 9). 


The coefficient of determination is the proportion of the sum of squares of deviations of the y values 
that can be credited to a linear relationship between x and y. This is defined by 


2 _ Syy ~ SSE 
Syy 
SSE 
(as 
Syy 
n 
n\2 
> (% — 5) 
-1 i=1 
<1.5 ; 
~~ Oi - 9) 


We can see that 0 < r? < 1. We can interpret r? to be the proportion of variability explained by 
the regression line. When x contributes no information for predicting y, Sy, and SSE will be nearly 
equal, and hence r? will be near to zero. If x contributes information for predicting y, Syy will be 
larger than SSE, and hence r? will be greater than zero. Thus, r* = 0.75 means that use of } instead of 
y to predict y reduced the sum of squares of deviations of the y values about their predicted values } 
by 75%. This can also be interpreted as meaning that nearly 75% of the variation is explained by the 
independent variable x. In general, about (r* x 100)% of the sample variation in y can be attributed 
to using x to predict y in the linear model. The coefficient of nondetermination is the percent of variation 
that is unexplained by the regression equation and is given by 1 — r?. 


(i) For Exercises 8.2.2 and 8.2.3, find the coefficient of determination, and discuss the 
information contributed by x in predicting y. 

(ii) Collect a couple of real-life data and find the corresponding regression lines. Also draw the 
scatterplot for e; versus j and determine whether the regression line obtained is a good fit or 
not based on the coefficient of determination. 


8C. Outliers and High Leverage Points 


One of the important aspects of residual analysis is to identify any existence of unusual observations in 
a data set. There are two possibilities for a data point to be unusual. It could be in the response variable 
(i-e., in the horizontal direction) representing model failure, or in the predictor variable (i.e., in the 
vertical direction). It should be noted that unusual observations in the horizontal direction occur 
when we assume that the independent variable X in the linear model is random. An observation that 
is unusual in the vertical direction is called an outlier. An observation that is unusual in the horizontal 
direction is called a high leverage point (or just leverage point). 


Consider the following 10 points, which we will call base points, and three additional points 
representing an outlier (O), a high leverage point (H), and both (OH), respectively. 
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10 Base points O | H | OH 
x}/—-1l} 0 |2])]-2)/5;/6) 8 | 11] 12) -3 6/19] 19 
y|—-5 | -4]}2]-7}]6]9 | 13 | 21 | 20 | —9 30 | 13 | 30 


Investigate the effect of adding a single aberrant point by running four separate regressions: 
(i) regression for 10 base points; (ii) regression for 10 base points plus O; (iii) regression for 10 
base points plus H; and (iv) regression for 10 base points plus OH. For each of them, find By and 
By as well as the coefficient of determination. Discuss the effects of each type of outlier on the 
regression line. 
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Chapter 


Design of Experiments 


Objective: To study the basic design concepts for experiments and through which we can make 
comparisons of treatments with respect to the observed responses. 
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Genichi Taguchi 
(Source: http://www.amsup.com/BIOS/g_taguchi.html) 


Genichi Taguchi (1924-) acquired his statistical skills under the guidance of Prof. Motosaburo 
Masuyama, one of the best statisticians of his time. After World War II, Japanese manufacturers were 
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struggling to survive with very limited resources. Taguchi revolutionized the manufacturing process 
in Japan through cost savings. He understood that all manufacturing processes are affected by outside 
influences—noise. However, Taguchi realized methods of identifying those noise sources that have 
the greatest effects on product variability. Isolating these factors to determine their individual effects 
can be a very costly and time-consuming process. Taguchi devised a way to use the so-called orthog- 
onal arrays to isolate these noise factors from all others in a cost-effective manner. He introduced 
the loss function to quantify the decline of a customer's perceived value of a product as its qual- 
ity declines. Taguchi referred to the ability of a process or product to work as intended regardless of 
uncontrollable outside influences as robustness. This was a novel concept in the design of experiments 
with profound influence in manufacturing. His ideas have been adopted by successful manufactur- 
ers around the globe because of their results in creating superior production processes at much 
lower costs. 


9.1 INTRODUCTION 


In statistics, we are concerned with the analysis of data generated from an experiment. It is desir- 
able to take the necessary time and effort to organize the experiment appropriately so that we have 
the right type of data and sufficient amount of data to answer the questions of interest as clearly 
and efficiently as possible. This process is called experimental design. We can trace the roots of mod- 
ern experimental design to the 1935 publication of the book The Design of Experiments, written by 
Sir Ronald A. Fisher. He showed how one could conduct credible experiments in the presence of 
many naturally fluctuating conditions such as the soil condition, temperature, and rainfall, in an 
agricultural experiment. Because then, the design principles that were developed for agricultural 
experiments were successfully adapted to industrial, military, and other applications. In modern 
industry it is essential to manufacture parts efficiently and with practically no defects. As a result, 
variation reduction in quality characteristics of these parts has become a major focus of quality and 
productivity improvement. Dr. Genichi Taguchi pioneered the use of design of experiments (DOE) 
in designing robust products—those relatively insensitive to changes in design parameters. Presently, 
DOE is used as an essential tool for improving the quality of goods and services. It is important to 
note that, unless a sound design is employed, it may be very difficult or even impossible to obtain 
valid conclusions from the resulting data. Also, properly designed experiments will generate more 
precise data while using substantially fewer experimental runs than ad hoc approaches. In indus- 
trial manufacturing, some of the major benefits of DOE are lower costs, simultaneous optimization 
of several factors, fast generation and organization of quantitative information, and overall quality 
improvement. 


It is important to clearly identify the particular questions that an experiment is intended to answer 
(that is, the major objective of the experiment) before performing the experiment. These objectives 
may be to estimate or predict some unknown parameters, to explore relationships among various 
factors, to compare a collection of effects or parameters, or any combinations of these. When the 
intention is to compare parameters, the objective may be to corroborate a hypothesis, or to explore 
some simple relationships. In any design, it is necessary to identify the populations that are to be 
studied and the type of information about these populations that will be needed to answer the desired 
questions. While planning an experiment to investigate the primary objectives of the investigation, 
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we need to ensure that the measurement process is simple, the cost of the study is reasonable, the 
study can be concluded in a reasonable time frame, and the study produces reliable data. Because 
of the complex nature of real-world problems, planning an effective experiment is not an easy task. 
The important issues confronting one area, say engineering, will be different from those for another 
area such as biology or medicine. As a result, the design of experiments can take several forms. In 
this chapter, we will follow a general framework. Two of the major distinguishing elements of DOE 
are (1) simultaneous variation and evaluation of various factors, and (2) systematic removal of some 
of the possible test combinations to cut back experimental time and cost. Thus, a researcher should 
ensure that the statistical design is as simple as possible given the objectives of the experiment and 
within the practical constraints such as material, labor, and cost. Some other desirable criteria of a 
good design are that it provides unbiased estimates of treatment effects and the experimental error. 
In addition, it should be able to detect important small differences with sufficient precision, and it 
should provide an estimation of uncertainty in the conclusions and the confidence with which the 
result can be extended to other analogous situations. The experimental design determines the basic 
characteristics of the data collected. These data are then processed using statistical analysis techniques, 
with the goals of these analyses being determined by the experimental objectives. Conclusions are 
obtained by looking at the results of the statistical analyses. 


9.2 CONCEPTS FROM EXPERIMENTAL DESIGN 


In this section we introduce some of the basic definitions, methods, and procedures used in the 
experimental design. Many of the terms used have an agricultural basis, because the early development 
and applications of DOE were in the field of agriculture. 


9.2.1 Basic Terminology 


The first step in planning an experiment is to formulate a clear statement of objectives of the test 
program. The purpose of most statistical experiments is to determine the effect of one or more inde- 
pendent variables on the response variable. The main variable of interest in a study is the response 
variable, also called an output variable. These are the dependent variables (also referred to as criteria, 
effect, or predicted variable) in an experiment that describes the factors we are interested in predicting 
or comparing. The response variable is measured with different values of independent variables (rep- 
resenting those factors that are assumed to be the causes of the outcome) and analyzed to determine 
whether the independent variables have any effect. For example, in an agricultural experiment, the 
crop yield could be the response variable, whereas the type of soil, temperature, and rainfall could 
be the independent variables. We would like also to identify known or expected sources of variability 
in the experimental units, because one of the main aims of a designed experiment is to reduce the 
effect of these sources of variability on the answers to questions of interest. Hence, we must make a 
list of the factors that may affect the value of the response variable. We must also decide how many 
observations should be taken and what values should be chosen for each independent variable in 
each individual test run. 


Definition 9.2.1 The variables that an experimenter is able to completely control in the DOE are called 
independent variables or treatment variables. These are also called input variables, explanatory 
variables, or factors. 
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Basically, factors are independent variables whose effect on the response variable is a main objective 
of the study. These are control variables selected by the analyst for comparison. A factor is a general 
category or type of treatment. Factors can be either quantitative or qualitative based on whether the 
variable is measured on a numerical scale or not. For example, a rice field is divided into six parts, and 
each part is treated with a different fertilizer to see which produces the most rice. Here the response 
variable is the amount of rice output. The objective of the study is to compare the effects of different 
fertilizers on the rice output. Thus, the type of fertilizer is the factor. 


Definition 9.2.2 Independent variables that are unknown or known but nonmanipulable are called 
nuisance variables. 


A factor can have different levels referred to as the treatment or factor levels. Different treatments 
constitute different levels of a factor. Levels are the values at which the factors are set in an experiment. 
The level of a variable or treatment means its amount or magnitude. For example, if the experimental 
units of a medication were given as 2.5 mg, 5 mg, and 10 mg, those amounts would be three levels of 
the treatment. Level is also used for categorical variables, such as drugs I, II, and III, where the three 
are different kinds of drugs, not different amounts of the same thing. Suppose four different groups 
of students are subjected to four different teaching methods. The students are the experimental units, 
the teaching methods are the treatments, and the four types of teaching methods constitute four 
levels of the factor “type of teaching.” Note that this is a single-factor experiment, the factor being the 
method of teaching. 


Definition 9.2.3 Noise is the effect of all the uncontrolled factors in an experiment. 


In some experiments, all the noise factors are known; however, in most cases only some of them are 
known. When an analyst controls the specification of the treatments and the method of allocating 
the experimental units to each of the treatments, the experiment is called designed. For example, n 
rats are randomly assigned to one of the five dose levels of an experimental drug under investigation. 
The analyst can also decide on the number n; of rats for each dose level such that )7?_, 1) =. 


Sometimes, conducting a designed experiment may not be practical or ethical. For example, if an 
analyst wants to know the relationship between fat content in a diet and the cholesterol level, it 
would be unethical and costly as well as time consuming to subject human volunteers to different 
fat-content diets. However, it is possible to observe the cholesterol levels of people who consume 
different diets. Care must be taken to record various other factors, such as exercise habits, age, and 
gender, before reporting any association between cholesterol levels and fat content of diets. The 
experiment is called observational, if the analyst is just an observer of the treatments on a sample of 
experimental units. Note that the experimental units are objects to which treatments are applied. 


The crucial difference between an experiment and an observational study for comparing the effects of 
treatments is that, in an experiment, the researcher decides which experimental units receive which 
treatments, whereas in an observational study, the researcher simply compares experimental units 
that happen to be there that have received each of the treatments. Observational studies are often 
useful for identifying possible causes of treatment effects, and they are often cheaper. Their main 
disadvantage is that they are less conclusive. Only properly designed and executed experiments can 
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lead to reliable conclusions. Hence, in general designed experiments are preferred over observational 
experiments. In designing the experiment, there are almost always going to be constraints such as 
budget, time, and availability of experimental units. 


The following example illustrates an observational experiment, where the analyst has control over the 
random sampling from the treatment populations as well as the size of each sample, but has no 
control over the assignment of the experimental units to the treatments. 
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Example 9.2.1 
In order to compare the risk-taking tendency of the people that invest in mutual funds, samples are taken 
of individuals from three income groups—low income class, middle income class, and high income class. 
A score is given based on the percentage of their investment allocation on different types of mutual funds, 
such as large-cap, mid-cap, small-cap, hybrid, and specialty. The mean score for each income group is 
calculated. Identify each of the following elements: response, factors and factor type(s), treatments, and 
experimental units. 


Solution 
The response is the variable of interest, which is the score given to each individual investor. The only factor 
investigated is the income class. This is a qualitative variable. The three income classes represent the levels of 
this factor. The treatment is the percentage investments in different types of mutual funds, such as large-cap, 
mid-cap, small-cap, hybrid, and specialty. The experimental unit is the individual investor. 

= 


There are single-factor experiments and multifactor experiments. The previous example was a case of 
a single-factor experiment. Single-factor experiments have only one independent variable. Another 
example of a single-factor experiment is when we are interested in the effect of size of the screen of 
a computer monitor on the reading speed. In this case, the size of the screen is the single factor. If 
there are only two sizes, say 15-in. and 17-in. monitors, that we wish to compare, tests such as the 
two-sample t-test could be used to compare average reading speed. If there are more than two sizes 
of monitors, then the one-way ANOVA methods described in Chapter 10 could be used for analysis 
of the resulting data. 


Even though the single-factor experiments are simple and elegant, they are costly and not very effective 
when there is more than one independent variable. Efficient use of resources is achieved through 
multifactor experiments in comparison to conducting many single-factor experiments. A multifactor 
experiment involves two or more independent variables and a dependent variable. Also, a greater 
range of questions could be answered using multifactor experiments. The resulting data are analyzed 
using ANOVA as described in Chapter 10. The following is an example of a multifactor experiment. 
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Example 9.2.2 
In order to study the conditions under which a particular type of commercially raised fish reach maximum 
weight, an experiment is conducted at four water temperatures (60°F, 70°F, 80°F, 90°F) and four water 
salinity levels (1%, 5%, 10%, 15%). Fish are raised in tanks with specific salinity levels and temperature 
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levels. There are 32 tanks and one of the four temperatures and one of the four salinity levels are assigned 
randomly to each tank. The weights are recorded at the beginning of the experiment and after 2 months. 
Identify each of the following elements: response, and factors and factor type(s). Write all the treatments 
from the factor-level combinations. 


Solution 
The response is the variable of interest, which is the weight gain of a fish. This experiment has two factors: 
water temperatures at four levels and water salinity at four levels. There are 4 x 4 = 16 possible treatments: 


(60°F, 1%) (60°F, 5%) (60°F, 10%) (60°F, 15%) 
(70°F, 1%) (70°F, 5%) (70°F, 10%) (70°F, 15%) 
(80°F, 1%) (80°F, 5%) (80°F, 10%) (80°F, 15%) 
(90°F, 1%) (90°F, 5%) (90°F, 10%) (90°F, 15%) 


It should be noted that there may be other factors, such as the density of the fish population, the 
initial size of the fish, and the type of feeding, that may affect weight gain of fish. 


Definition 9.2.4 The experimental error explains the variation in the responses among experimental units 
that are assigned the same treatment and observed under identical experimental conditions. 


Experimental error can occur for many reasons, among them (1) the difference in the devices that 
record the measurements, (2) the natural dissimilarities in the experimental units prior to their 
receiving the treatment, (3) the variation in setting the treatment conditions, and (4) the effect on 
the response variable of all extraneous factors other than the treatment factors. 


In order to construct confidence intervals on the treatment population means and to test hypotheses, it 
is necessary to obtain an estimate of the variance of experimental design. In a single-factor experiment 
with k levels, the estimate of the variance of experimental design could be taken as the pooled 
variance of responses from experimental units receiving the identical treatments. A large variance of 
experimental error will compromise the accuracy of inferences made from the experiments. Also, large 
amounts of experimental error make it difficult to determine whether the treatment has produced 
an effect or not, so one of the design goals is to reduce the experimental error. Bad execution of a 
design can lead to the whole experiment becoming a waste of time and resources. It is necessary 
to implement techniques to reduce experimental error in order to obtain more accurate inferences. 
One approach to reducing experimental error is to take extra care in conducting the experiment. The 
effect of experimental error can be reduced by using more homogeneous experimental materials (if 
available), and using the fundamental principles of replication, randomization, and blocking (see 
Section 9.2.2). 


The one-way analysis of variance (in a single-factor experiment at several levels) enables one to compare 
several groups of observations, all of which are independent with the possibility of a different mean 
for each group. A test of significance is whether or not all the means are equal. Two-way analysis of 
variance is a method of studying the effects of two factors on the response variable. 
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There are other terms that are important in different applications. For example, in the medical field, 
the terms blinding, double-blind, and placebo are used. In a medical experiment, the comparison of 
treatments may be distorted if the patient, the person administering the treatment, and those evalu- 
ating it know which treatment is being allocated to which patient. It is therefore necessary to ensure 
that the patient, and/or the person administering the treatment, and/or the trial evaluators do not 
know (are blind to) which treatment is allocated to whom. If only the patient is unaware of the 
treatment, it is called blinding, and if both the patient and the person administering the treatment 
are blind to which treatment is being allocated, it is called double-blinding. In order to study the effect 
of a particular drug, experimenters divide the study population into two groups and treat one group 
with the drug and the other group with a so-called placebo, which could be just sugar pills. In order 
to clarify the objective of a design, it is necessary for an experimental designer to consult a wide range 
of people, especially those affected by the problem to be solved. 


9.2.2 Fundamental Principles: Replication, Randomization, and Blocking 


A good design of an experiment makes efficient use of resources to gather the data needed to meet 
the goals of the study. There are three fundamental principles that need to be considered in a good 
experimental design. They are replication, randomization, and blocking. 


Definition 9.2.5 Replication means that the same treatment is applied (i) several times to the same 
experimental units, or (ii) one time to several similar experimental units, called replicate units. 


Replications are necessary for the estimation of the error variance in an experiment against which 
the differences among treatments are assessed. If an experiment is intended to test whether or not 
a number of treatments differ in their effects, these treatments must be applied to replicate units of 
the experiment. In order to show that two treatments have different mean effects, we need to mea- 
sure several samples given the same treatment. For example, observing that one plant of a particular 
genotype is more resistant to a disease than another plant of a different genotype does not convey 
anything about the difference between the mean disease resistance of the two genotypes. This dif- 
ference could have been caused by the environment or the inoculation procedure affecting the two 
plants differently. Hence, to make any inference about the mean difference between the genotypes, 
we have to test several plants of each type. Thus, increasing the number of replications increases 
the reliability of inferences drawn from the observed data. It is necessary to increase the number of 
replications to decrease the variance of the treatment effect estimates and also to provide more power 
for detecting differences in treatment effects. We should not confuse multiple observations of the 
same experimental unit with replication. Replication involves applying the treatment to a number of 
experimental units. 


Definition 9.2.6 A block is a portion of the experimental unit that is more likely to be homogeneous within 
itself than with other units. 


Blocking refers to the distribution of the experimental units into blocks in such a way that the units 
within each block are more or less homogeneous. The experimenter uses information of the possible 
variability among units to group them in such a way that most of the unwanted experimental error 
can be removed through the block effect. 
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For blocking to be effective, the units should be arranged so that within-block variation is much 
smaller than between-block variation. As an example, suppose a researcher wishes to compare the 
yields of rice for four different kinds of fertilizers. In order to minimize the effect of environmental 
and soil conditions, the field may be divided into smaller blocks and each block is further parceled 
into four plots. Each variety of fertilizer is applied in each block with one in each parcel. This method 
ensures that the external conditions from plot to plot within a block will be relatively uniform. Then 
we can use the ANOVA methods to pool from block to block to obtain the within-block information 
about the treatment differences while avoiding between-block differences. The relevant analysis is 
given in Section 10.5. Time could also be a block factor, because the concentration or expertise could 
alter as one carries out a task, such as determining disease levels or scoring microscope slides. 


Definition 9.2.7 Randomization is the process of assigning experimental units to treatment conditions in 
an entirely chance manner. 


The main objective of randomization is to negate the effects of all uncontrolled extraneous variables. 
Usually, randomization is associated with design functions such as random sampling or selection, 
random assignment, and random order. Random assignment of experimental units to groups tends to 
spread out differences between subjects in unsymmetric or random ways so that there is no tendency 
to give an edge to any group. In any well-conducted experiment, randomization eliminates bias 
from the experiment, enables us to use statistical tests of significance, and creates valid estimates of 
experimental error. For instance, suppose we are measuring the time of flowering of plants in a glass 
house or a growth cabinet. If the pots are arranged so that all the plants of one variety are next to each 
other, and we observe that one variety flowers earlier than the rest, does this imply that this variety is 
inherently earlier-flowering, or does it suggest that the light and temperature conditions in that part 
of the cabinet or glass house cause plants to flower early? It is not possible to tell from an experiment 
designed in this manner. Randomizing the treatments in time or space is an insurance policy, to take 
account of variation that we may or may not know to exist under the conditions of our experiment. 
For instance, the levels of light in growth cabinets vary considerably, so randomizing the layout of the 
plants of different types is essential to make sure that no one type is consistently exposed to light and 
temperature levels that are particularly high or low. Another way of selecting experimental units is 
simply to use intact groups, such as all students in a particular statistics classroom. Results obtained 
this way may be highly biased and hence not desirable. It should be noted that random assignment 
does not completely eliminate the problem of correlated data values. 


Now we study some steps that can be used for randomization. Suppose there are N homogeneous 
experimental units and k treatments. In order to randomly assign r; experimental units to the ith 
treatment with )“*_, r; = N, we could use the following steps. 


PROCEDURE FOR RANDOM ASSIGNMENT 


1. Number the experimental units from 1 to N. 
2. Use a random number table or statistical software to get a list of numbers that are random 
permutations of the numbers 1 to N. 
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3. Give treatment 1 to the experimental units having the first r; numbers in the list. Treatment 2 will 
be given to the next r2 numbers in the list, and so on; give treatment k to the last r, units in the list. 


The following example illustrates the random assignment procedure. 
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Example 9.2.3 
In order to study the number of hours to relief provided by five different brands (A, B, C, D, E) of pain 
reliever, doses are administered to 25 subjects numbered 1 through 25 with each brand administered to 
five subjects. Develop a design using the random assignment procedure. 


Solution 


Using Minitab, we obtained the following random permutations of the numbers from 1 to 25. 


7 8 7 12 10 25 23 4 6 3 
9 21 5 24 18 16 22 14 17 15 
20 13 2 71 #19 


Using the randomized procedure, we obtain the design given in Table 9.1. 


Table 9.1 
Subject: 1 8 7 12 10 25 23 4 6 3 9 21 
Brand: A A A A A B B B B BC C 


Subject 5 24 18 16 22 14 17 15 20 13 2 11 19 
pee cl ClCUCéiPCUCétPUCwPULc DUDE E 
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That is, subject number 8 will get brand A pain reliever, subject 23 will get brand B pain reliever, and so forth. 
We can rewrite Table 9.1 as shown in Table 9.2. 


Table 9.2 
Brand Subject 
A 1 8 7 12 «10 


B 25 23 4 6 3 


9 21 5 24 18 


oO };}n 
a 
N 
N 


14 17 #15 


E 20 13 2 11 #19 
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It should be noted that once we create the design, the actual data will contain the number of hours to relief 
for each individual. 
| 


It is important to note that randomization may not be possible in some cases. Observational studies 
may be necessary whenever the researcher cannot use controlled randomized experiments. For exam- 
ple, if we want to study the effect of smoking on lung cancer, randomization will mean that we should 
be able to select a group of people and tell a randomly selected subgroup to smoke and the other 
subgroup not to smoke. This is not only practically impossible; it is also unethical to deliberately 
expose people to a potentially hazardous substance. 


9.2.3 Some Specific Designs 


In this subsection, we will introduce three specific designs: completely randomized design, random- 
ized complete block design, and Latin square design. The structure of the experiment in a completely 
randomized design is presumed to be such that the treatments are assigned to the experimental units 
completely at random. Example 9.2.1 is one such a design. In order to create a completely randomized 
design, follow the procedure given in Section 9.2.2. 


The randomized complete block design is a design in which the subjects are matched according to a 
variable that the experimenter wants to control. The subjects are put into groups (blocks) of the same 
size as the number of treatments. The elements of each block are then randomly assigned to different 
treatment groups so as to reduce the influence of unknown variables. For example, a researcher is 
carrying out a study of three different drugs for the treatment of high cholesterol. Suppose she has 
45 patients and divides them into three treatment groups of 15 patients each. Using a randomized 
block design, the patients are rated and put in blocks of three based on the cholesterol level: The 
three patients with the highest cholesterol are put in the first block, those with the next highest levels 
are put in the second block, and so on to the 15th block. The three members of each block are 
then randomly assigned, one to each of the three treatment groups. If there is very little extraneous, 
systematic variation, complete randomization allows differences between the mean effects of the 
treatments to be estimated with higher precision than other designs. However, it does not allow for 
the possibility that there could be some unknown extraneous factors, so ifin doubt, use a randomized 
complete block design. 


Suppose we have k treatments and N experimental units. Further, assume that the experimental units 
can be grouped into b groups containing & experimental units, so that N = bk. We could use the 
following steps for a randomized complete block design. 


PROCEDURE FOR RANDOMIZATION IN A RANDOMIZED COMPLETE BLOCK DESIGN 
1. Group the experimental units into 6 groups (blocks) containing k homogeneous experimental 
units. 
2. In group 1, number the experimental units from 1 to k and obtain a random permutation of 
numbers 1 to k using a random number generator. 
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3. In group 1, the experimental unit corresponding to the first number in the permutation receives 
treatment 1, the experimental unit corresponding to the second number in the permutation 
receives treatment 2, and so on. 

4. Repeat steps 2 and 3 for each of the remaining blocks. 


We illustrate the step-by-step procedure just given in the following example. 
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Example 9.2.4 
In order to study the number of hours to relief provided by five different brands (A, B, C, D, E) of pain relievers 
for pain resulting from different causes [headache (H), muscle pain (M), pain due to cuts and bruises (CB)], 
doses are administered to five subjects each having similar types of pain. Create a randomized complete 
block design. Choose, as blocks, the different types of pain (H, M, or CB). 


Solution 

Using Minitab with k=5 we have generated the random permutations shown in Table 9.3 for each of the 
b=3 blocks of five numbers and assigned the treatments according to the procedure just explained. As 
the table indicates, among persons with headache, subject number 3 is treated with brand A pain killer, and 
so forth. 


Table 9.3 


In the previous example, we had only one replication of each treatment per block. This idea can 
be generalized to have r replications of each treatment per block. Then the generalized randomized 
complete block design, with k treatments, b blocks, and r replications with N =kbr which has kr 
homogeneous experimental units, can be randomized as follows. 


PROCEDURE FOR A RANDOMIZED COMPLETE BLOCK DESIGN WITH r REPLICATIONS 
1. Group the experimental units into 6 groups (called blocks), each containing rk homogeneous 
experimental units. 
2. In group 1, number the experimental units from 1 to rk and generate a list of numbers that are 
random permutations of the numbers 1 to rk. 


476 CHAPTER 9 Design of Experiments 


3. In group 1, assign treatment 1 to the experimental units having numbers given by the first r 
numbers in the list. Assign treatment 2 to the experiments having next r numbers in the list, and so 
on until treatment k receives r experimental units. 

4. Repeat steps 2 and 3 for the remaining blocks of experimental units. 


The following example illustrates this procedure. 
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Example 9.2.5 
With the following modifications, consider Example 9.2.2. Three groups of subjects are considered, with 
each group having 15 subjects. Group | consists of subjects with only headache (H), group II of subjects 
only with muscle pain (M), and group Ill of subjects only pain due to cuts and bruises (CB). Of the 15 with 
headache (group 1), three are treated with brand A pain killer, three with brand B, and so forth. Subjects 
with other types of pain are treated similarly. Here the number of replications is three for each type of drug 
and for each type of pain. Create a randomized complete block design with three replications. 


Solution 

Using Minitab, for the group with headache (H), we generate a random permutation of numbers 1 to 15. The 
first three are given pain killer A, the next three B, and so forth. The process is repeated for other types of 
pain killers. The design is given in Table 9.4 where “2(A)” means that patient 2 is given brand A pain killer. 


Table 9.4 
H M CB H M CB 


By increasing the number of replications, we can increase the accuracy of estimators of treatment 
means and the power of the tests of hypotheses regarding differences between treatment means. 
However, because of constraints such as cost, time needed to handle a large number of experimen- 
tal units, and even availability of experimental units, it is not realistic to have a large number of 
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replications. It is then necessary to determine the minimum number of replications needed to meet 
reasonable specifications on the accuracy of estimators or on the power of tests of hypotheses. We 
give a simple procedure for determining the number of replications needed. 


Let r be the number of replications that we need to determine. Let o be the experimental standard 
deviation, and E be the desired accuracy of the estimator. Then the sample size required to be 
(1 — a)100% confident that the estimator is within F units of the true treatment mean, j, is 
(Za/ 2)767 
r=, 
E2 


The values of ¢ could be obtained from past experiments, from a pilot study, or by using a rough 
estimator 


6 = (largest observation — smallest observation) /4. 


Following is an example for determining the appropriate number of replications. 


ES 


Example 9.2.6 

A researcher wants to know the effect of class sizes on the mean score in a standardized test. She wants to 
estimate the treatment means j11, 42, (43, and j44 such that she will be 95% confident that the estimates 
are within 10 points of the true mean score. What is the necessary number of replications to achieve this 
goal? It is known from the previous experiments that scores have ranged from 46 to 98. 


Solution 
A rough estimator of o is 


~ Range 98-46 _ 
ee i z = 


From the normal table, z9,925 = 1.96. The value of E = 10. Thus, the number of replications necessary is 


13. 


_ (aj2)?6? _ (1.96)?(13)? 


2 (10)2 = 6.4923 =7. 


Thus, the researcher should use seven replications of each of the treatments to obtain the desired precision. 
= 


We have used the randomized complete block design when we wanted to control a single source 
of extraneous variation and there is only one factor of interest. When we need to compare k treat- 
ment means and there are two possible sources of extraneous variation, a Latin square design is the 
appropriate design of experiment. 


Definition 9.2.8 A k x k Latin square design contains k rows and k columns. The k treatments are 
randomly assigned to the rows and columns so that each treatment appears in every row and column of the 
design. 


It was the famous mathematician Leonhard Euler who introduced Latin squares in 1783 as a new 
kind of magic squares. Even though the idea is fairly elementary, a systematic use of Latin squares to 
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the design of experiments was advanced by Ronald A. Fisher only around 1921. Fisher realized that in 
a two-dimensional plot of land, the systematic error due to variation in soil and other factors could 
be minimized by a suitable Latin square partition of the plot. 


The following example illustrates a case in which the experimental problems are affected by two 
sources of extraneous variation, the type of car and type of driver used. 


Eee eee 
Example 9.2.7 
A gasoline company is interested in comparing the effect of four gasoline additives (A, B, C, D) on the gas 
mileage achieved per gallon. Four cars (I, Il, Ill, IV) and four drivers (1, 2, 3, 4) will be used in the experiment. 
Create a Latin square design. 


Solution 

We can filter out the variability due to type of cars used by ensuring that in each row only one of the 
additive types appears. Also, to filter the driver effect, use each additive only once for each driver. One such 
randomization results in the Latin square design given in Table 9.5. 


Table 9.5 
Drivers 


Cars 12 3 4 


I DB A C 
Il C A D B 
Ml B D C A 


To construct a basic Latin square, one can use the following method, which we will present only for the 4 x 4 


Latin square of Example 9.2.7. 
= 


PROCEDURE FOR CONSTRUCTING A 4 x 4 LATIN SQUARE 
1. Begin with the first row as A, B, C, D. 
2. Generate each succeeding row by taking the first letter of the preceding row and placing it last, 
which has the effect of moving the other letters one position to the left. 
3. Randomly assign one block factor to the rows and the other to the columns. 
4. Randomly assign levels of the row factor, column factor, and treatment to row positions, column 
positions, and letters, respectively. 
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In step 2 of the foregoing procedure, instead of using the cyclic placement of rows, we can perform a 
cyclic placements for the columns. Accordingly, change the procedures in steps 3 and 4. 


The following example illustrates a 4 x 4 Latin square design. 


2 AH > A$ NAA AAA S$ AA 


Example 9.2.8 
Using the previous procedure, construct a Latin square for the case of Example 9.2.7. 


Solution 
Following the procedure just given, the Latin square in Example 9.2.7, the basic Latin square is represented 
by Table 9.6. 
Table 9.6 
Drivers 
Cars 12 3 4 
I A B Cc D 
Il BC DA 
Ml Cc D A B 
IV DA BC 


Now one random assignment of cars, |, Il, Ill, IV, is to the rows 4, 3, 2, 1 (this is a random order of numbers 
1, 2, 3, 4) of Table 9.6. This gives Table 9.7. 


Table 9.7 
Drivers 


Cars 12 34 


I DA BC 
Il Cc D A B 
ll B C DA 


IV A B C D 


Now one random assignment of the drivers 1, 2, 3, 4 is to the columns 1, 2, 4, 3 (this is a random order of 
numbers 1, 2, 3, 4) of Table 9.7, resulting in the Latin square shown in Table 9.8. 
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Table 9.8 

Drivers 
Cars 1p 23523) 74 
I DA Cc UB 


Now along with this Latin square, we can represent the corresponding observations (numbers in parentheses 
are the gas mileage in miles per gallon) as shown in Table 9.9. 


Table 9.9 


Drivers 
Cars 1 2 3 4 


I D(18) A(22)  =C(25) ~—-B(19) 


Il C(22) D(24) B26) A(24) 


Il B(21) €(20) A(22) D(23) 


IV A(17)—-B(24), ss D(23)—s C(21) 


Note that if we use the notation 1 for additive A, 2 for additive B, 3 for additive C, and 4 for additive 
D, the Latin square in the previous example can be rewritten as shown in Table 9.10. 


Table 9.10 
Drivers 


Cars 12 3 4 


I 4 1 3 2 
i} 3.4 2 1 
Il 2 3 1 4 
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Table 9.11 
A B Cc A B Cc D E 
Cc A B B A E Cc D 
B Cc A Cc D A E B 
3x3 D E B A Cc 
E Cc D B A 
5x5 


This representation will be convenient if we need to write down a model. In order to test for the 
treatment effects, one could use the ANOVA methods discussed in Chapter 10. 


For Latin square experiments involving k treatments, it is necessary to include k observations for each 
treatment resulting in a total of k* observations. Table 9.11 shows two examples of Latin squares for 
n=3,andn=5. 


We have used the Latin square design to eliminate two extraneous sources of variability. In order to 
eliminate three extraneous sources of variability, we can use a design called the Greco-Latin square. 
Greco-Latin squares are also called orthogonal Latin squares. This design consists of k Latin and k Greek 
letters. In this design, we take a Latin square and superimpose upon it a second square with treatments 
denoted by Greek letters. In this superimposed square, each Latin letter coincides with exactly one 
of each Greek letter. In our gasoline example, if we introduce the effect of, say, four different days, 
represented by Greek letters, then Table 9.12 shows the 4 x 4 Greco-Latin square. 


Table 9.12 


Aa BB Cy D6 


Bs Ay DB Ca 


CB Da Ad By 


Dy Cd Ba AB 


We will not go into more detail on this design, or on the many other similar designs. 


When developing an experimental design, it is important for the researcher to learn more about 
the terminology as well as the intricacies of the field in which the experiment will be performed. It 
is also important to observe that there are many other practical constraints affecting the design of 
experiments. For example, experiments are done by organizations and individuals that have limited 
resources of money and time. Appropriating these resources within the constraints is an integral 
part of planning an experiment. Also, many problems are approached sequentially in several stages. 
Planning for each stage is built on what has been learned before. Dealing with these types of issues 
is beyond the scope of this book. 
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EXERCISES 9.2 


9.2.1. In order to study the conditions under which hash-brown potatoes will absorb the least 
amount of fat, an experiment is conducted with four frying durations (2 min, 3 min, 4 min, 
5 min) and using four different types of fats (animal fat I, animal fat II, vegetable fat I, 
vegetable fat II). The amount of fat absorbed is recorded. Identify each of the following 
elements: response, factors, and factor type(s). Write all the treatments from the factor-level 
combinations. 


9.2.2. A team of scientists is interested in the effects of vitamin A, vitamin C, and vitamin D 
on the number of offspring born for a specific species of mice. An experiment is set up 
using the same species of mice. The mice are randomly assigned to three groups. Each 
mouse in the study gets the same amount of food and daily exercise and is kept at the same 
temperature. One group of mice gets extra vitamin A, another group gets extra vitamin C, 
and the remaining group gets extra vitamin D. The supplements are added to their food. 
The number of offspring are counted and recorded for each group. 


(a) What is the response variable? 
(b) What is the factor? 


9.2.3. Thirty rose bushes are numbered 1 to 30. Three different fertilizers are to be applied to 10 
bushes each. Develop a design using the random assignment procedure. 


9.2.4. Three different fertilizers are to be applied to five bushes each for three varieties of flower 
plants: gardenia (G), rose (R), and jasmine (J). Create a randomized complete block design. 
Choose as blocks the different types of plants (G, R, or J). 


9.2.5. With the following modifications, consider Exercise 9.2.4. Three groups of flower plants are 
considered, with each group having nine plants. Group I consists of gardenia (G), group II 
consists of rose (R), and group III consists of jasmine (J). Of the nine gardenias (group J), 
three are treated with brand A fertilizer, three with brand B, and three with brand C fertilizer. 
Other plant types are treated similarly. Here the number of replications is three for each type 
of fertilizer and for each type of plants. Create a randomized complete block design with 
three replications. 


9.2.6. What are the reasons for using randomization in Exercises 9.2.3 to 9.2.52 


9.2.7. Suppose a food processing company wants to package sliced pineapples in cans. They have 
four different processing plants, say, A, B, C, and D. Suppose they have 56 truckloads (num- 
bered 1 to 56) of pineapples collected from different parts of the country. In order to get 
some uniformity in taste, it is better to randomly assign the trucks to the four plants. Develop 
a design using the random assignment procedure. 


9.2.8. In Exercise 9.2.1, suppose there are four pans and 24 packets of hash-brown potatoes. 
Randomly select six of the 24 packets to be fried with each of the fat types. 


(a) Create a randomized complete block design. 
(b) Create a Latin square design. 
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9.2.9. A chemist is interested in the effects of five different catalysts (A, B, C, D, E) on the reaction 
time of a chemical process. There are five batches of new material (1, 2, 3, 4, 5). She decides 
to study the effect of each catalyst on each material for 5 days (1, 2, 3, 4, 5). Construct a 
Latin square design for this experiment. 


9.2.10. Suppose a dating service wants to schedule dates for four women, Anna, Carol, Judy, and 
Nancy, with Ed, John, Marcus, and Richard on Thursday, Friday, Saturday, and Sunday in 
such a way that each man dates each woman in the 4 days. Create a Latin-square design 
displaying a schedule that the dating service could follow. 


9.2.11. In order to test the relative effectiveness of four different fertilizer mixtures on an orange crop, 
a Florida farmer applies the fertilizer and measures the yield per unit area when it harvests. 
The four experiments cannot be carried out on the same plot of land. Devise a Latin square 
arrangement of dividing a single plot into a 4 x 4 grid of subplots for administering the 
fertilizers (labeled randomly A, B, C, D). 


9.2.12. Aresearcher wants to know the effect of four different types of fertilizers on the mean number 
of tomatoes produced. He wants to estimate the treatment means /11, (2, (43, and j44 such 
that he will be 90% confident that the estimates are within five tomatoes of the true mean 
number of tomatoes. What is the necessary number of replications to achieve this goal? It 
is known from previous experiments that the numbers of tomatoes per plant have ranged 
from 20 to 60. 


9.3 FACTORIAL DESIGN 


In this section, we introduce a treatment design where the treatments are constructed from several 
factors rather than just being k levels of a single factor. The treatments are combinations of levels of 
the factors. A factorial experiment can be defined as an experiment in which the response variable is 
observed at all factor-level combinations of the independent variables. A factorial design is used to 
evaluate two or more factors simultaneously. In general, there are three ways to obtain experimental 
data: one-factor-at-a-time, full factorial, and fractional factorial. The most efficient design is the frac- 
tional factorials. A simple approach for examining the effect of multiple factors is the one-at-a-time 
approach. The advantages of factorial designs over one-factor-at-a-time experiments is that they allow 
interactions to be spotted. An interaction occurs when the effect of one factor varies with the level of 
another factor or with some combination of levels of other factors when there are multiple factors. 


The one-way analysis of variance, discussed in the next chapter, enables us to compare several groups 
of observations, all of which are independent with the possibility of a different mean for each group. 
A test of significance is whether or not all the means are equal. Two-way analysis of variance is a way 
of studying the effects of two factors separately, such as their main effects, and together, with their 
interaction effect. 


9.3.1 One-Factor-at-a-Time Design 


In one-factor-at-a-time design, one conducts the experiment with one factor at a time. Here we hold 
all factors constant except one and take measurements on the response variable for several levels 
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of this one factor, then choose another factor to vary, keeping all others constant, and so forth. We 
are familiar with this type of experiment from undergraduate chemistry or physics labs. One of the 
drawbacks of this method is that all factors are evaluated while the other factors are at a single setting. 
For example, in the case of Example 9.2.2, we would set a fixed temperature and study the effect of 
water salinity on fish weight gains, and then set a fixed water salinity and vary temperature. All this is 
time consuming. 


e]_>AA$>SVvw Aaa 


Example 9.3.1 

Consider the following hypothetical data, in which two types of diet (fat, carbohydrates) in two levels (high, 
medium) were administered for a week for a sample of individuals. At the end of the week, each subject 
was put on a treadmill and time of exhaustion, in seconds, was measured. The objective was to determine 
the factor-level combination that will give maximum time of exhaustion. Table 9.13 gives average time to 
exhaustion for each combination of diet. 

Discuss this as a one-factor-at-a-time experiment to predict average time of exhaustion. 


Solution 

We can see that the average time of exhaustion decreases when fat content is increased from medium to high 
while holding carbohydrate at medium. The average time of exhaustion also decreases when carbohydrate 
content is increased from medium to high while holding fat at medium. Thus, it is tempting to predict that 
increasing both fat and carbohydrate consumption will result in a lower average time of exhaustion. The 
problem with this reasoning is that the prediction is based on the assumption that the effect 


Table 9.13 
Average time to exhaustion Fat Carbohydrate 
88 High Medium 
98 Medium Medium 
77 Medium High 
74 High High 


of one factor is the same for both levels of the other factor. Changing the fat content from medium to 
high, keeping carbohydrate at medium, and the carbohydrate content from medium to high, keeping fat at 
medium, reduced the average time of exhaustion by approximately 10 seconds. The question then is, can we 
predict that increasing both fat and carbohydrate content to high will lower the average time of exhaustion 
to approximately 67 seconds? To answer this question, we need to administer high levels of both diets to 
a sample and observe the average time of exhaustion. If it is 67 seconds, then our observation is correct. 
However, what if the observation is 74 seconds? The average time of exhaustion has been lowered, but not 
as much. If this happens, we say that the two factors interact. When factors interact, the effect of one factor 
on the response is not the same for different levels of the other factor. Hence, the information obtained from 
the one-factor-at-a-time approach would lead to an invalid prediction. 

= 
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The factor-level combination for a one-factor-at-a-time approach of Example 9.3.1 can be seen from 
Figure 9.1. 


If there is no interaction, we get Figure 9.2, which shows average time to exhaustion with three given 
points and a possible point of around 68 seconds. 


Definition 9.3.1 Two factors I and II are said to interact if the difference in mean responses for different 
levels of one factor is not constant across levels of the second factor. 


If there is interaction, the lines in Figure 9.2 might cross each other, in which case a one-factor-at-a- 
time approach may not be the appropriate design. In that case, the following alternative designs will 
give more accurate data. 


9.3.2 Full Factorial Design 

One way to get around the problem of interaction in one-factor-at-a-time design is to evaluate all 
possible combinations of factors in a single experiment. This is called a full factorial experiment. The 
main benefit of a full factorial design is that every possible data point is collected. The choice of 
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optimum condition becomes easy. For example, in an experiment such as the one in Example 9.2.2, 
one could conduct a full factorial design. The simplest form of factorial experiment involves two 
factors only and is called a two-way layout. A full factorial experiment with n factors and two levels for 
each factor is called a 2” factorial experiment. A full factorial experiment is practical if only a few factors 
(say, fewer than five) are being investigated. Beyond that, this design becomes time consuming and 
expensive. 


9.3.3 Fractional Factorial Design 


In a fractional factorial experiment, only a fraction of the possible treatments are actually used in the 
experiment. A full factorial design is the ideal design, through which we could obtain information 
on all main effects and interactions. But because of the prohibitive size of the experiments, such 
designs are not practical to run. For instance, consider Example 9.2.2. Now if we were to add say, two 
different densities, three sizes of fish, and three types of food, the number of factors becomes five, 
and total number of distinct treatments will be 4 x 4 x 2 x 3 x 3 = 288. This method becomes very 
time consuming and expensive. The number of relatively significant effects in a factorial design is 
relatively small. In these types of situations, fractional factorial experiments are used in which trials 
are conducted on only a well-balanced subset of the possible combinations of levels of factors. This 
allows the experimenter to obtain information about all main effects and interactions while keeping 
the size of the experiment manageable. The experiment is carried out in a single systematic effort. 
However, care should be taken in selection of treatments in the experiment so as to be able to answer 
as many relevant questions as possible. The fractional factorial design is useful when the number of 
factors is large. Because we are reducing the number of factors, a fractional factorial design will not 
be able to evaluate the influence of some of the factors independently. Of course, the question is how 
to choose the factors and levels we should use in a fractional factorial design. The question of how 
fractional factorial designs are constructed is beyond the scope of this book. 


EXERCISES 9.3 


9.3.1. Suppose a large retail chain decides to introduce clothing in two types of materials’ (ordinary, 
fine) qualities. Each store will have two different proportions (40%, 60%) displayed. At the 
end of the month, profits from each store for these two types of clothing are recorded. 
Table 9.3.1 represents the average profits for each of the quality-proportion combinations. 


Table 9.3.1 


Average profit Quality Proportion 


$10,000 Fine 40% 
$25,000 Ordinary 40% 
$9500 Ordinary 60% 
? Fine 60% 


Discuss this as a one-factor-at-a-time experiment to predict the average amount of profit. 
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9.3.2. Draw graphs for the data to represent quality-proportion combinations (a) for the one- 
factor-at-a-time approach, and (b) for the case where there is no interaction. 


9.3.3. Discuss how a fractional factorial design can be performed for the problem in Exercise 9.3.1. 


9.3.4. Suppose a researcher wants to conduct a series of experiments to study the effect of fertilizer 
and temperature on plant growth. She uses four different brands of fertilizers in three different 
settings for the rose plants of the same age and of similar growth. 

(a) How many factor-level combinations are possible in this experiment? 
(b) Each experiment makes use of one fertilizer-temperature combination (one-factor-at-a- 
time design). How should she implement randomization in this experiment? 


9.4 OPTIMAL DESIGN 


In 1959, J. Kiefer presented a paper to the Royal Statistical Society about his work on the theory of 
optimal design. He was trying to answer the major question, “How do we find the best design?” This 
work initiated a whole new field of optimal design. The methods of optimal experimental design 
provide the technical tools for building experimental designs to attain well-defined objectives with 
efficiency and with minimum cost. The cost can be the monetary cost, time, number of experimental 
runs, and so on. There are many methods of achieving optimal designs such as sequential (simplex) 
or simultaneous experiment designs. In sequential design, experiments are performed in succession 
in a direction of improvement until the optimum is reached. Simultaneous experiment designs such 
as response surface designs are used to build empirical models. A survey by Atkinson in 1988 contains 
many references on optimal design. 


In this section, we focus only on one simple example to illustrate the ideas of optimal design in terms 
of choosing appropriate sample size. It is not possible to have a single design that is best for securing 
information concerning all types of population parameters. Indeed, it is beyond the scope of this 
section to present a general theory of optimal design. 


9.4.1 Choice of Optimal Sample Size 


The sample size estimation is an essential part of experimental design; otherwise, sample size may 
be very high or very low. If sample size is too low, the experiment will lack the accuracy to provide 
dependable answers to the questions we are investigating. Ifsample size is too large, time and resources 
will be wasted, often for insignificant gain. We now illustrate a simple case of optimal sample size 
determination. 


Let Xu1,..., Xin, be a random sample from population 1 with mean j; and variance o? and 


X21,.-., X2n, be random samples from population 2 with mean j2 and variance a. Assume that 
the two samples are independent. Then we know that X; — X> is an unbiased estimator of w1 — 2 
with standard error 


2 _ yy 
oR —-X) = Var(X1 — X2) 
ot | 9% 
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Suppose that there is a restriction that the total observations should be n, that is, ny + nz =n. Such 
a restriction may be due to cost factors or to a shortage of available subjects. An important design 
question is how to choose the sample sizes n; and n2 so as to maximize the information in the data 
relevant to the parameter j4; — (42. We know that the samples contain maximum information when 
the standard error is minimum. Hence, the problem reduces to minimization of Var(X, — X2). Let 
a=“ be the fraction on n observations that is assigned to sample 1. Then n, = naandn2 = n(1—a), 
and we have 


_ .. 0 @ 
Var(X1, — X2) = — 
ni) n2 

2 2 

o%7 uD) 


2 
i} 


aay" This 
problem that can be solved using calculus. By taking the derivative with respect to a, é g(a) and 
equating it to zero, we have 


2 
The problem is now reduced to finding an a that minimizes the function g(a) = ~ + 


Multiplying throughout by na?(1 — a)*, we have 
-o(1 —a*)+ asa’ =0 
which results in the quadratic equation 
(0% = o7)a? + 2oza = o7 =0. 


Using the quadratic formula, we obtain the two roots as 


O1 
ay, = — 
01 +02 
and 
onl 
az = ———.. 
01 — 942 


However, a2 cannot be the solution because, if 0; > 02, then a2 > 1, otherwise a2 < 0; both are not 
admissible because a is a fraction. Hence, 
O1 


a= — and 1-—a=———. 
O{ oD o1 +09 


02 


Using the second derivative test, we can verify that this indeed is a minimum for var(X; — X2). From 
this analysis we can see that the sample sizes that maximize the information in the data relevant to 
the parameter jz; — 42 subject to the constraint n; + nz =n are 

O1 


ny = ———n and nz = ——n. 
01 +02 01 +02 


02 
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As a special case, we can see that when oj = 03, the optimal design is to take n) = n2. 


EXERCISES 9.4 


9.4.1. A total of 100 sample points were taken from two populations with variances of = 4 and 
o% = 9. Find n; and no that will result in the maximum amount of information about 
(1 — b2). 


9.4.2. Suppose in Exercise 9.4.1 we want to take n =n =n2. How large should n be to obtain the 
same information as that implied by the solution of Exercise 9.4.1? 


9.5 THE TAGUCHI METHODS 


Taguchi methods were developed by Genichi Taguchi to improve the implementation of total quality 
control in Japan. These methods are claimed to have provided as much as 80% of Japanese quality 
gains. They are based on the design of experiments to provide near-optimal quality characteristics 
for a specific objective. A special feature of Taguchi methods is that they integrate the methods of 
statistical design of experiments into a powerful engineering process. The Taguchi methods are in 
general simpler to implement. 


Taguchi methods are often applied on the Japanese manufacturing floor by technicians to improve 
their processes and their product. The goal is not just to optimize an arbitrary objective function, but 
also to reduce the sensitivity of engineering designs to uncontrollable factors or noise. The objective 
function used is the signal-to-noise ratio, which is then maximized. This moves design targets toward 
the middle of the design space so that external variation affects the behavior of the design as little as 
possible. This permits large reductions in both part and assembly tolerances, which are major drivers of 
manufacturing cost. Linking quality characteristics to cost through the Taguchi loss function (Taguchi 
and Yokoyama, 1994) was a major advance in quality engineering, as well as in the ability to design 
for cost. Taguchi methods are also called robust design. In 1982, the American Supplier Institute 
introduced Dr. Taguchi and his methods to the U.S. market. 


Using a well-planned experimental design, such as a fractional factorial design, it is possible to 
efficiently obtain information about the model and the underlying process. Clearly, the purpose of 
these methods is to control and ensure the quality of the end product. In the conventional approach, 
this is achieved by further testing a few end products that are randomly chosen or using control 
charts and making decisions based on certain preset criteria, such as acceptable or unacceptable. 
Thus, “quality” of the product is thought of as inside or outside of specifications. Instead, Taguchi 
suggested that we should specify a target value, and the quality should be thought of as the variation 
from the target. 


As an example, suppose we make n observations of the output x1,...,x%, of a process at times 
1, 2,...,n, as shown in Figure 9.3. 


The control chart consists of a plot of observed output values (x;’s) on the y-axis and the times of 
observation, 1, 2,..., on the x-axis, as shown in the figure. The letter T represents the target value. If 
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W@ FIGURE 9.3 Control plot of processing times and outputs. 
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Wi FIGURE 9.4 Loss function. 


the output value is between 7; and Ty, the process is deemed to be operating satisfactorily; otherwise 
the process is said to be out of control and the output value is considered unsatisfactory. 


Some other examples are (1) defining specification limits for acceptance, such as stating that the 
diameter of bolts must be between 9.8 mm and 10.2 mm with mean 10 mm, and (2) that the waiting 
time in a line should be less than 30 minutes for at least 90% of customers. 


In all these situations, the specifications partition the state of the process as acceptable or unacceptable, 
that is, it classifies the state as a dichotomy. This is often called the “goal post mentality.” 


The basic idea of the Taguchi approach is a shift in mindset from demarking the quality as acceptable 
or unacceptable to a more flexible and realistic classification. The traditional approach to quality 
control does not take into account the size of departure from the target value. To accommodate the 
size of such departure as a significant factor in quality control, let us introduce the concept of loss 
function (see Chapter 11). If an output value x differs from the target value T, let L(T, x) denote the 
loss incurred, say in dollars. Other possible losses could also be reputation or customer satisfaction. 


For the control chart example, we can assign the loss function 


L(L.x) 0, iffy <x<Ty 
xy= F 
L, ifx>Tzy orx < Ty 


where L is a constant and x is the measured value. This is schematically shown in Figure 9.4. 


From Figure 9.4, it is seen that we view outputs x; and x2 as having equal quality, whereas x2 and 
x3 are considered to have vastly differing quality (x2 is acceptable and x3 is not acceptable). A more 
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L(T, x) 


T, x T Tu 


Wi FIGURE 9.5 Quadratic loss function. 


reasonable conclusion would be that x; has excellent quality, whereas x2 and x3 are similar, both 
being poor. 


In Taguchi's approach, the loss function takes into account the size of departure from the target value. 
For example, a popular choice for the loss function is 


LZ, X) =K(X -T)*, 


where 
L = loss incurred, 
k = constant, 
X = actual value of the measured output, and 
T = target value. 


We can schematically represent the behavior as shown by Figure 9.5. 


This form of loss function is called the quadratic loss function. The choice of k depends on the partic- 
ular problem. For example, the scaling factor k can be used to convert loss into monetary units to 
accommodate comparisons of systems with different capital loss. Or, in product manufacturing, let 
D denote the allowed deviation from the target, and let A denote the loss due to a defective product. 
Then a choice of k can be k = (A/D)?. As shown earlier, the average loss is E(L) and is given by 


E(L) = k{(E(X) — T)* +07] = k{(bias)* + variance] 


where o? is the variance of X (measured quality, which is assumed to be random). In Taguchi, the 
variation from the target can be broken into components containing bias and product variation. Thus, 
if our aim is to minimize the expected loss, E(L), we should not only require E(X) = p to be close 
to T but also should reduce the variance. It turns out that often these requirements are contradictory. 
The objective is to choose the design parameters (the factors that influence the quality) optimally to 
obtain the best quality product. In practice, the parameters and o* are not known and are being 
estimated by X and S%, respectively. This results in the Taguchi loss function 


L=k{(X —T)* + $7]. 


This loss function penalizes small deviations from T only slightly, while assessing a larger penalty 
for responses far from the target. The expected loss is similar to a mean squared error loss, which we 
have seen in regression analysis in the form of least squares. 
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Why is controlling both bias and variance important? Suppose you want your community swimming 
pool temperature at 80°F, which is the T here. Suppose the temperature varies between 60°F and 
100°F. Clearly the average (bias) is zero; however, it will be pretty uncomfortable to swim at 60°F or 
100°F. Here the bias takes the ideal value of zero, but the variance is large. In another scenario, the 
variance may be small, but the average temperature may be farther away from the target value of 80°F 
(for example, the temperature is constant at 60°F). Hence, we want the pool temperature to be near 
to the target value of 80°F with as small variance as possible (say, within 1°F to 2°F). 


Taguchi coined the term design parameters as the generic description for factors that may influence 
the quality and whose levels we want to optimize. Taguchi's philosophy is to “design quality in” 
rather than to weed out the defective items after manufacturing. In order to obtain an optimal set of 
design parameters that affect the quality of the end product, the Taguchi method utilizes appropriately 
designed experiments. More specifically, orthogonal arrays are used for fractional factorial designs. 
Taguchi provides tables for these designs so that even a nonspecialist can use them. For two-level 
designs (high, low), we have a table for an L4 orthogonal array up to three factors; a table for an Lg 
orthogonal array up to seven factors; and so forth. Similar tables are available for three-level designs. 
We will not describe these design issues in this section. We refer the reader to specialized books on 
the subject for further details. 


We can summarize the Taguchi approach to quality design as follows: 


1. Taguchi's methods for experimental design are ready made and simple to use in the design of 
efficient experiments, even by nonexperts. 

2. Taguchi's approach to total quality management is holistic and tries to design quality into a 
product rather than inspecting defects in the final product. 

3. Taguchi's techniques can readily be applied to other fields such as management problems. 


EXERCISES 9.5 


9.5.1. Suppose the following data represent thickness between and within silicon wafers (in 
microns), with a target value of 14.5 microns. 


13.688 13.788 14.173 14.557 
13.925 14.545 13.797 14.778 


Compute the Taguchi loss function. 


9.5.2. One of the commonly used performance measures in the Taguchi method is 


(mean)2 
log 2 ‘ 


where s? is the sample variance. In general, the higher the performance measure, the better the 
design. This measure is called robustness statistics. For the problem of Exercise 9.5.1, suppose 
that we run the experiment by controlling various factors affecting the thickness. Table 9.5.1 
shows the data obtained in four different runs. 
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Table 9.5.1 


Run 1: 14.158 14.754 14412 14.065 13.802 14424 14.898 14.187 


Run 2: 13.676 14.177 14.201 14.557 13.827 14.514 13.897 14.278 


Run 3: 13.868 13.898 14.773 13.597 13.628 14.655 14.597 14.978 


Run 4: 13.668 13.788 14.173 14.557 13.925 14.545 13.797 14.778 


(a) Using the robustness statistics given earlier, which of the processes gives us an improved 
performance? 
(b) Another commonly used performance statistic is 


— log(s?). 


Using this robustness statistic, which of the processes gives us an improved performance? 
Compare this with the results of part (a). 


9.6 CHAPTER SUMMARY 


In this chapter, we have learned some basic aspects of experimental design. Some fundamental 
definitions and tools for developing experimental designs such as randomization, replication, and 
blocking were introduced in Section 9.2. Basic concepts of factorial design were given in Section 9.3. In 
Section 9.4, we saw an example of optimal design. The Taguchi method was introduced in Section 9.5. 
In the next chapter, we introduce the analysis component. We have discussed only a very small col- 
lection of experimental designs in this chapter. There exist a wide variety of experimental designs to 
deal with a large number of treatments and to suit specific needs of research experiments in diverse 
fields. It is an exciting and growing area for the interested student to apply and explore. 


We list some of the key definitions introduced in this chapter: 


Response variable (output variable) 
Independent variables (treatment variables or input variables or factors) 
Nuisance variables 

Noise 

Observational 

Experimental units 

Single-factor experiments 

Multifactor experiments 

Experimental error 

Blinding, double-blinding, and placebo 
Replication 

Block 

Randomization 

Completely randomized design 
Randomized complete block design 


a 
a 
| 
a 
a 
a 
| 
| 
a 
| 
a 
a 
| 
a 
a 
a kx k Latin square design 


494 CHAPTER 9 Design of Experiments 


m Greco-Latin square 
m design parameters 


In this chapter, we have also learned the following important concepts and procedures. 


Procedure for random assignment 

Procedure for randomization in a randomized complete block design 
Procedure for a randomized complete block design with r replications 
Procedure for constructing a 4 x 4 Latin square 

One-factor-at-a-time design 

Full factorial design 

Fractional factorial design 

Choice of optimal sample size 

The Taguchi methods 


9.7 COMPUTER EXAMPLES 


In this chapter, we present Minitab and SAS commands only. SPSS commands can be performed 
similarly to Minitab. 


9.7.1 Minitab Examples 


 —$ 


Example 9.7.1 
Obtain a random permutation of numbers 1 ton. 


Solution 


Enter in C1 the numbers 1 ton, say n = 10. Then 


Calc > random data > samples from column... > 
enter sample 10 > rows from column(s) C1 > Store samples in: C2 > OK 


The result is a random permutation of numbers 1 to n(= 10). One such permutation is given by 
85971064321 


Now if we need to generate blocks of random permutations of numbers 1 to n(=10), in the foregoing steps, 
just store samples in C3, C4,.... 
= 


9.7.2 SAS Examples 


ee 


Example 9.7.2 
For the data of Example 9.2.4, conduct a randomized complete block design using SAS. 
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Solution 
We represent blocks that are reasons for pain by H=1, M = 2, and CB=3. Similarly five brands which are 
treatments by A=1, B=2, C = 3, D= 4, and E = 5. Then we can use the following code to generate a 


randomized complete block design. 


options nodate nonumber; 
data a; 
ado lock = 1 to 3 
do subject = 1 to 5; 
xX = ranuni(0); 
output; 
end; 
end 
proc sort; by block x; 
Gleita CZ SSL ee 
tre = I ar ect = I, Ble 7 oa = reneincer oir 
NYS esy/ 
proc sort; by block subject; 
proc print; 
var block subject trt; 
run; 


We get the following output. 


Completely randomized 2x 3design, 4subjects per cell 
Obs block subject trt 
1 1 1 5 
2 1 2 4 
3 1 3 3 
4 1 4 2 
5 a 5 1 
6 2 1 2 
7 2 2 5 
8 2 3 3 
9 y) 4 q 

10 2 5 
11 3 1 4 
12 3 2 5 
13 3 3 
14 3 4 2 
15 3 5 3 
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Note that the numbers in the column corresponding to a block identify the type of pain, the numbers 
in the subject column correspond to the subjects, and the numbers in the column corresponding to 
trt identify the brands. Using the corresponding letters, we can rewrite the foregoing table in the 
familiar form shown in Table 9.14. 


Table 9.14 
H M (al} 
1(E) 1(B) ~—-1(D) 


2(D) 2(E)  2(E) 


3(C) =3(C)_~—3(A) 


The PLAN procedure constructs experimental designs. The PLAN procedure does not have a DATA= 
option in the PROC statement; in this procedure, both the input and output data sets are specified 
in the OUTPUT statement. We will use this to construct a Latin square design. 


——_—_—_————a——oOoOo~—_—_—_—_—_————————————— 
Example 9.7.3 
A gasoline company is interested in comparing the effect of four gasoline additives (A, B, C, D) on the gas 
mileage achieved per gallon. Four cars (1, 2, 3, 4) and four drivers (I, Il, Ill, IV) will be used in the experiment. 
Create a Latin square design. 


Solution 
We can use the following program, where we represent the additives by 1 = A,2 = B, 3 =C, and4=D. 


Options nodate nonumber; 
title *Latin Square design for 4 additives’; 
proc plan seed=37432; 
factors rows=4 ordered cols=4 ordered/NOPRINT; 
treatments tmts=4 cyclic; 
output out=g 
rows cwvels=(ear 1° “car 2° “car 3° “car 4°) 
random 
COlS CvaIs=( Driver 1° “Driver 2° ~Diriweir 3° 
“Driver 4’) random 
tmts nvals=(1 2 3 4) random; 
run; 
proc tabulate; 
Elass rows colss 
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WA Whines 2 
table rows, cols*(tmts*f=1.); 
keylabel sum=’ ’; 

run; 


PROJECTS FOR CHAPTER 9 


9A. Sample Size and Power 


Suppose that the experimenter is interested in comparing the true means of two independent 
populations. If two similar treatments are to be compared, the assumption of equality of variances 
is not unreasonable. Hence, assume that the common variance of the two populations is 07, and 
the experimenter has a prior estimate of the variance. We learned in Section 9.4 that in this case, the 
optimal design will be to take sample sizes n; and nz to be equal. Let n = n, = nz be the size of 
the random sample that the experimenter should take from each population. 


Now, suppose that the experimenter has decided to use the one-sided large sample test, Ho : “1 = La 
vs. Hq : [41 > [42 with a fixed a = P(Type I error). He wants to choose n to be so large that if 
Ly = 2 +ko, he will get a fixed power (1 — ) of deciding jw; > j2. Recall that power of a test is the 
probability of (correctly) rejecting Ho when Hp is false. Find the approximate value of n. Note that, 
for a given a, this will be an optimal sample size with a desired value of the power. 


In particular, what should be the sample size in the hypothesis testing problem, Ho : “#1 — “2 = 0 
vs. Hy: 41 — 2 = 3, ifa = B = 0.05. Assume that o = 7. 


9B. Effect of Temperature on Spoilage of Milk 


Suppose you have observed that milk in your refrigerator spoils very fast. You may be wondering 
whether it has anything to do with the temperature settings. Design an experiment to study the effect 
of temperature on spoiled milk, with at least three meaningful settings of the temperature. (i) Write 
a possible hypothesis for your experiment. (ii) What are the independent and dependent variables? 
(iii) Which variables are being controlled in this experiment? (iv) Discuss how you used the three 
basic principles of replication, blocking, and randomization. (v) What conclusions can you make? 
Think through any possible flaws in the design that may affect the integrity of your findings. 
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Objective: To analyze the means of several populations by identifying sources of variability of 
the data. 


10.1 Introduction 500 

10.2 Analysis of Variance Method for Two Treatments (Optional) 501 

10.3 Analysis of Variance for Completely Randomized Design 510 

10.4 Two-Way Analysis of Variance, Randomized Complete Block Design 526 
10.5 Multiple Comparisons 536 

10.6 Chapter Summary 543 

10.7 Computer Examples 543 

Projects for Chapter 10 554 


John Wilder Tukey 
(Source: http://en. wikipedia. org/wiki/John_Tukey) 


Mathematical Statistics with Applications 
Copyright © 2009 by Academic Press, Inc. All rights of reproduction in any form reserved. 499 


500 CHAPTER 10 Analysis of Variance 


John W. Tukey (1915-2000), a chemist-turned-topologist-turned statistician, was one of the most 
influential statisticians of the past 50 years. He is credited with inventing the word software. He 
worked as a professor at Princeton University and a senior researcher at AT&T's Bell Laboratories. He 
made significant contributions to the fields of exploratory data analysis and robust estimation. His 
works on the spectrum analysis of time series and other aspects of digital signal processing have been 
widely used in engineering and science. He coined the word bit, which refers to a unit of information 
processed by a computer. In collaboration with Cooley, in 1965, Tukey introduced the fast Fourier 
transform (FFT) algorithm that greatly simplified computation for Fourier series and integrals. Tukey 
authored or coauthored many books in statistics and wrote more than 500 technical papers. Among 
Tukey's most far-reaching contributions was his development of techniques for “robust analysis,” 
an approach to statistics that guards against wrong answers in situations where a randomly chosen 
sample of data happens to poorly represent the rest of the data set. Tukey also made significant 
contributions to the analysis of variance. 


10.1 INTRODUCTION 


Suppose that we are interested in the effect of four different types of chemical fertilizers on the yield 
of rice, measured in pounds per acre. If there is no difference between the different types of fertilizers, 
then we would expect all the mean yields to be approximately equal. Otherwise, we would expect the 
mean yields to differ. The different types of fertilizers are called treatments and their effects are the 
treatment effects. The yield is called the response. Typically we have a model with a response variable 
that is possibly affected by one or more treatments. The study of these types of models falls under the 
purview of design of experiments, which we discussed in Chapter 9. In this chapter we concentrate on 
the analysis aspect of the data obtained from the designed experiments. If the data came from one or 
two populations, we could use the techniques learned in Chapters 6 and 7. Here, we introduce some 
tests that are used to analyze the data from more than two populations. These tests are used to deal 
with treatment effects, including tests that take into account other factors that may affect the response. 
The hypothesis that the population means are equal is considered equivalent to the hypothesis that 
there is no difference in treatment effects. The analytical method we will use in such problems is 
called the analysis of variance (ANOVA). Initial development of this method could be credited to Sir 
Ronald A. Fisher who introduced this technique for the analysis of agricultural field experiments. The 
“green revolution” in agriculture would have been impossible without the development of theory of 
experimental design and the methods of analysis of variance. 


Analysis of variance is one of the most flexible and practical techniques for comparing several means. 
It is important to observe that analysis of variance is not about analyzing the population variance. In 
fact, we are analyzing treatment means by identifying sources of variability of the data. In its simplest 
form, analysis of variance can be considered as an extension of the test of hypothesis for the equality 
of two means that we learned in Chapter 7. Actually, the so-called one-way analysis of variance is 
a generalization of the two-means procedure to a test of equality of the means of more than two 
independent, normally distributed populations. 
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Recall that the methods of testing Ho : 41 — 42 = O, such as the t-test, were discussed earlier. In 
this chapter, we are concerned with studying situations involving the comparison of more than two 
population or treatment means. For example, we may be interested in the question “Do the rates 
of heart attack and stroke differ for three different groups of people with high cholesterol levels 
(borderline high such as 150-199 mg/dL, high such as 200-239 mg/dL, very high such as greater 
than 240 mg/dL) and a control group given different dosage levels of a particular cholesterol-lowering 
drug (say, a particular statin drug)?” Let us consider four populations with means j11, 42, 43, and [L4, 
and say that we wish to test the hypotheses 41 = 2 = 3 = [4. That is, the mean rate is the same 
for all the four groups. The question here is: Why do we need a new method to test for differences 
among the four procedure population means? Why not use z- or f-tests for all possible pairs and test 
for differences in each pair? If any one of these tests leads to the rejection of the hypothesis of equal 
means, then we might conclude that at least two of the four population means differ. The problem 
with this approach is that our final decision is based on results of (5) = 6 different tests, and any 
one of them can be wrong. For each of the six tests, let a = 0.10 be the probability of being wrong 
(type I error). Then the probability that at least one of the six tests leads to the conclusion that there 
is a difference leads to an error 1 — (0.9)° = 0.46856, which clearly is much larger than 0.10, thus 
resulting in a large increase in the type I error rate. Hence, if an ordinary t-test is used to make several 
treatment comparisons from the same data, the actual a-value applying to the tests taken as a group 
will be larger than the specified value of w, and one is likely to declare significance when there is none. 


Analysis of variance procedures were developed to eliminate the increase in error rates resulting from 
multiple t-tests. With ANOVA, we are able to set one alpha level and test whether any of the group 
means differ from one another. Given a sample from each of the populations, our interest is to answer 
the question: Are the observed discrepancies among the different sample means merely due to chance 
fluctuations, or are they due to inherent differences among the populations? Analysis of variance 
separates the effect of purely random variations from those caused by existing differences among 
population means: The phrase “analysis of variance” springs from the idea of analyzing variability in 
the data to see how much can be attributed to differences in 4. and how much is due to variability in 
the individual populations. The ANOVA method incorporates information on variability from all of 
the samples simultaneously. At the heart of ANOVA is the fact that variances can be partitioned, with 
each partition attributable to a specific source. The method inspects various sums of squares (which 
are measures of variation in a sample) calculated from the data. ANOVA looks at two types of sums 
of squares: sums of squares within groups and sums of squares between groups. That is, it looks at 
each of the distributions and compares the between-group differences (variation in group means) 
with the within-group differences (variation in individuals’ scores within groups). 


10.2 ANALYSIS OF VARIANCE METHOD FOR TWO TREATMENTS (OPTIONAL) 


In this section, we present the simplest form of the analysis of variance procedure, the case of studying 
the means of two populations I and II. For comparing only two means, the ANOVA will result in the 
same conclusions as the t-test for independent random samples. The basic purpose of this section is to 
introduce the concept of ANOVA in simpler terms. Let us consider two random samples of size n, and 
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, Yin; from population I and y21, y22,..., yan, from population 


n2, respectively. That is, yi, yi2, . 


II. Let 
y= yu + ¥12 7 vt Yim (sample mean from population I) 
and 
— Y21 + 22 +++" + Yanz (sample mean from population II). 


y2 — 
n2 
These samples are assumed to be independent and come from normal populations with respective 
means /11, /42, and variances of = 05. We wish to test the hypothesis 


Ao : 1 = 2 VS. Ha: 1 # 2. 


The total variation of the two combined response measurements about ¥ (the sample mean of all 


n = Nn, +N observations) is (SS is used for sum of squares) defined by 


2 = nj 
Total SS = a ye (yij - y)?. 


1721 


(10.1) 


That is, 
Yu + Y12 +++ + Yiny + 21 + 22 +++ + Y2n2 


n 


y= 


The total sums of squares measures the total spread of scores around the grand mean, y. We can 
rewrite (10.1) as 


2 nj 
Total SS = > s (yij — y) 
i=) j=) 
ny 5 n2 
=P (ong - 3)? +E (025-3)? 
J=1 j=l 
ng 


ny 
=> (jy -1 +1 -¥)? + D> (297 —Fa + 2-9)? 
g=1 j=l 
ny 


= 3 (917-91)? +1 (91-3)? +271 -9) 2 yom) 


j=l 
nz 


+3 (02) — Fe)? + (5-3)? +2 (52-3) 62). 


j=l 


10.2 Analysis of Variance Method for Two Treatments (Optional) 503 


ny n2 
Note that )° (91; — ¥;) =0 = )> (y2j — yz). We obtain 
j=l j=l 


ny n2 


Total SS = s (v1; - a) + > (y2j ~ ¥2)” 


j=l j=l 
+71 (¥1- y)? + n2 (¥2 - y) 
2 nj ‘ 2 3 
=> by - 9) + ui -7)- (10.2) 
i=1 j=1 (=) 


Define SST, the sum of squares for treatment by 
2 
SST = Dini Gi- yy: 


The SST measures the total spread of the group means y; with respect to the grand mean, y. Also, SSE 
represents the sum of squares of errors given by 


SSE = 5 vig — i) 


i=1 j=1 


ny 
= 2 Y1j- v1) + ° (v7 - 52)? 


j=l 
= (nj — 1)s} + (n2 - 1)s3 


where sj and s3 are the unbiased sample variances of the two random samples. Note that this connects 
the sum of squares to the concept of variance we have been using in previous chapters. We can now 
rewrite (10.2) as 


Total SS = SSE + SST. 


It should be clear that the SSE measures the within-sample variation of the y-values (effects), whereas 
SST measures the variation among the two sample means. The logic by which the analysis of variance 
tests is as follows: If the null hypothesis is true, then SST as compared to SSE should be about the 
same, or less. The larger SST, the greater will be the weight of evidence to indicate a difference in the 
means j4; and j42. The question then is, how large? 


To answer this question, let us suppose we have two populations that are normal. That is, let Y;; be 


N (uj, o*) distributed with values y;;. Then the pooled unbiased estimate of o? is given by 


> (my —VYst+(n2-1)85 SSE 
Ss = => . 
pP ny tng—-—2 nytng—2 
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Hence, 


Also, we can write 


= Ya)" 


= > a Ds ar 


which has a x?-distribution with (n; + n2 — 2) degrees of freedom. 


Under the hypothesis that w1 = 42, E (SST) = o?. Furthermore, 


Y, -Y. 
$a =e HD 1h, 


This implies that 


ge: 1,1 Y1—-Y2|_ SST 
7 ny n2 o2 ~ o2 


has a x*—distribution with 1 degree of freedom. It can be shown that SST and SSE are independent. 
From Chapter 4, we restate the following result. 


Theorem 10.2.1 If xj has vi degrees of freedom x3 has v2 degrees of freedom, and x} and x are inde- 


2 
pendent, then F = oi has an F-distribution with v; numerator degrees of freedom and v2 denominator 
2 


degrees of freedom. 


Using the foregoing result, we have 


SST /(1) 0? SST/1 


SSE/(ny +nz—2)02 — SSE/(n, +2 — 2) 


which has an F-distribution with v; = 1 numerator degrees of freedom and v2 = (ny +n2 — 2) 
denominator degrees of freedom. 


Now, we introduce the mean square error (MSE), defined by 
SSE 
(ny +12 — 2) 


_ (1 — Dsp + 2 — 155 
= (ny +n2 — 2) 


MSE = 
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and the mean square treatment (MST) given by 


MST = —— 
= [m1 (1-5)? +12 2-9)’. 


Under the null hypothesis, Hp : 41; =42, both MST and MSE estimate o? without bias. When Hp is 
false and 4; 4 2, MST estimates something larger than o? and will be larger than MSE. That is, if 
Ho is false, then E(MST) > E(MSE) and the greater the differences among the values of jx, the larger 
E(MST) will be relative to E(MSE). 


Hence, to test Ho : 41 = 2 Vs. Hg : 1 # M2, we use the F-test given by 


MST 
F= 
MSE 


as the test statistic. Thus, for given a, the rejection region is {F > F,}. It is important to observe that 
compared to the small sample t-test, here we work with variability. Now we summarize the analysis 
of variance procedure for the two-sample case. 


ANALYSIS OF VARIANCE PROCEDURE FOR TWO TREATMENTS 


For equal sample sizes n = ny = nz, assume of = 05. 


We test 


Ho : M1 = 12 VS. Ha: M1 # 2. 


1. Calculate: ¥7, 72, LVR ¥ yj, and find 


a 
Sst = )<nj (yi —y)’. 
i=1 
Also calculate 


2 
Total SS = ey eae 
bog 


ny +n 


Then 


SSE = Total SS — SST. 
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2. Compute 
SST 
E 
MSE = a 
ny +n2-—2 
3. Compute the test statistic, 
_ MST 
~ MSE* 
4. Fora given a, find the rejection region as 
RR:F > Fy, 


based on 1 numerator and (nj + nz — 2) denominator degrees of freedom. 

5. Conclusion: If the test statistic F falls in the rejection region, conclude that the sample evidence 
supports the alternative hypothesis that the means are indeed different for the two treatments. 
Assumptions: Populations are normal with equal but unknown variances. 


ee eeeeeeeelhRehehe_eleheh_e_le_leh®a®®=~=~_S SSS 
Example 10.2.1 
The following data represent a random sample of end-of-year bonuses for lower-level managerial personnel 
employed by a large firm. Bonuses are expressed in percentage of yearly salary. 


Female 62 92 80 7.7 84 9.1 74 67 
Male 8.9 100 94 88 120 99 11.7 98 
The objective is to determine whether the male and female bonuses are the same. We can answer this 
question by connecting the following. 
(a) Use the ANOVA approach to test the appropriate hypotheses. Use a = 0.05. 
(b) What assumptions are necessary for the test in part (a)? 
(c) Test the appropriate hypothesis by using the two-sample f-test for comparing population means. 
Compare the value of the f-statistic to the value of the F-statistic calculated in part (a). 


Solution 
(a) We need to test 


Ao : 1 = 2 VS. Ha : hy F 2 


From the random sample, we obtain the following needed estimates, ny =n2 = 8: 


Vi = 7.8375, y2 = 10.0625, )\ yz, = 1319.34, > yij = 143.20 
ij ij 
2 
SST = ) “nj (yj — ¥Z)* = 19.8025. 
i=1 
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Therefore, 
2 
(z=53) 
_ 2 AE 
ij 
(143.2) 
= 1391.34 — ———— = 109.70. 
16 
Then 
SSE = Total SS — SST 
= 109.7 — 19.8025 = 89.8975, 
SST 
MST = aor 19.8025 
and 
SSE 89.8975 
MSE = a 
2n, —2 14 
= 6.42125. 


Hence, the test statistic 


MST _ 19.8025 
MSE 6.42125 


= 3.0839. 


For a = 0.05, Fo.05,14 = 4.60. Hence the rejection region is {F > 4.60}. Because 3.0839 is not 
greater than 4.60, Ho is not rejected. There is not enough evidence to indicate that the average 
bonuses are different for men and women at a = 0.05. 

(b) To solve the problem, we assumed that the samples are random and independent with n, = nz = 8, 
drawn from two normal populations with means 41 and 42 and common variance o?. 

(c) The value of MSE is the same as s* = Ce = 6.42125. Also, yy = 7.8375 and yz = 10.0625. Then, 


the t-statistic is 


yi — y2 7.8375 — 10.0625 
pods ae = -1.756. 


1 1 1 1 
{? (4+ 2) [62125 ($+3) 


Now, to.025,14 = 2.415 and the rejection region is {t < —2.145}. 


Because —1.756 is not less than —2.45, Hg is not rejected, which implies that there is no significant difference 
between the bonuses for the males and the females. 


Note also that t? = F, that is, (—1.756)* = 3.083 implying that in the two-sample case, the t-test and 
F-test lead to the same result. 
= 
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It is not surprising that in the previous example, the conclusions reached using ANOVA and two 
sample t-tests are the same. In fact, it can be shown that for two sets of independent and normally 
distributed random variables, the two procedures are entirely equivalent for a two-sided hypothesis. 
However, a t-test can also be applied to a one-sided hypothesis, whereas ANOVA cannot. The purpose 
of this section is only to illustrate the computations involved in the analysis of variance procedures 
as opposed to simple t-tests. The analysis of variance procedure is effectively used for three or more 
populations, which is described in the next section. 


EXERCISES 10.2 


10.2.1. The following information was obtained from two independent samples selected from two 
normally distributed populations with unknown but equal standard deviations. Do the 
data present sufficient evidence to indicate that there is a difference in the mean for the two 
populations? 


Sample1}1/2/)/3)/3/1]2]1/3]1 
Sample2}2/5]2/4/3|1)2)3 


(a) Use the ANOVA approach to test the appropriate hypotheses. Use a = 0.05. 

(b) Test the appropriate hypothesis by using the two-sample t-test for comparing population 
means. Compare the value of the f-statistic to the value of the F-statistic calculated in 
part (a). 


10.2.2. The following information was obtained from two independent samples selected from two 
normally distributed populations with unknown but equal standard deviations. Do the 
data present sufficient evidence to indicate that there is a difference in the mean for the two 
populations? 


Sample 1:/ 15 | 13] 11 | 14) 10 | 12) 7 12} 11 | 14} 15 
Sample 2: | 18 | 16 | 13 | 21 | 16 | 19 | 15 | 18 | 19 | 20 | 21 | 14 


(a) Use the ANOVA approach to test the appropriate hypotheses. Use a = 0.01. 

(b) Test the appropriate hypothesis by using the two-sample t-test for comparing population 
means. Compare the value of the f-statistic to the value of the F-statistic calculated in 
part (a). 


10.2.3. A company claims that its medicine, brand A, provides faster relief from pain than another 
company’s medicine, brand B. A random sample from each brand gave the following times 
(in minutes) for relief. Do the data present sufficient evidence to indicate that there is a 
difference in the mean time to relief for the two populations? 


Brand A: | 47 | 51 | 45 | 53 | 41 | 55 | 50) 46 | 45 | 51 | 53 | 50 | 48 
Brand B: | 44 | 48 | 42 | 45 | 44 | 42 | 49 | 46 | 45 | 48 | 39 | 49 


(a) Use the ANOVA approach to test the appropriate hypotheses. Use a = 0.01. 
(b) What assumptions are necessary for the conclusion in part (a)? 
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(c) Test the appropriate hypothesis by using the two-sample t-test for comparing population 
means. Compare the value of the t-statistic to the value of the F-statistic calculated in 


part (a). 


10.2.4. Table 10.2.1 gives mean SAT scores for math by state for 1989 and 1999 for 20 randomly 
selected states (source: The World Almanac and Book of Facts 2000). 


Table 10.2.1 

State 1989 1999 
Arizona 523 525 
Connecticut 498 509 
Alabama 539 555 
Indiana 487 498 
Kansas 561 576 
Oregon 509 525 
Nebraska 560 571 
New York 496 502 
Virginia 507 499 
Washington 515 526 
Illinois 539 585 
North Carolina 469 493 
Georgia 475 482 
Nevada 512 517 
Ohio 520 568 
New Hampshire 510 518 


Using the ANOVA procedure, test that the mean SAT score for math in 1999 is greater than 
that in 1989 at aw = 0.05. Assume that the variances are equal and the samples come from 
a normal distribution. 


10.2.5. Let X1,..., Xn, and Yi,..., Yn, be two sets of independent, normally distributed random 
variables with means j1; and j12, and the common variance o?. Show that the two-sample 
t-test and the analysis of variance are equivalent for testing Hp : w1 = (42 versus Hy : 41 > [2. 
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10.3 ANALYSIS OF VARIANCE FOR COMPLETELY RANDOMIZED DESIGN 


In this section, we study the hypothesis testing problem of comparing population means for more 
than two independent populations, where the data are about several independent groups (different 
treatments being applied, or different populations being sampled). We have seen in Chapter 9 that the 
random selection of independent samples from k populations is known as a completely randomized 
experimental design or one-way classification. 


Let 1,..., 44g be the means of k normal populations with unknown but equal variance o?. The 
question is whether the means of these groups are different or are all equal. The idea is to consider 
the overall variability in the data. We partition the variability into two parts: (1) between-groups 
variability and (2) within-groups variability. If between groups is much larger than that within groups, 
this will indicate that differences between the groups are real, not merely due to the random nature 
of sampling. Let independent samples be drawn of sizesn;, i= 1,2,...,k andlet N=n,+---+n . 
Let y;; be the measured response on the jth experimental unit in the ith sample. That is, Yj; is the 
jth observation from population i, i = 1,2,...,k, and j = 1,2,...,n;. Let y be the overall mean 
of all observations. The problem can be formulated as a hypothesis testing problem, where we need 
to test 


Ho : #1 = 2 =... = Mg VS. Ha : Not all the wis are equal. 


The method of analysis of variance tests the null hypothesis Hp by comparing two unbiased estimates 
of the variance, o*, an estimate based on variations from sample to sample and the other one based 
on variations within the samples. We will be rejecting Ho if the first estimate is significantly larger 
than the second, so that the samples cannot be assumed to come from the same population. 


We can write the total sum of squares of deviations of the response measurements about their overall 
mean for the k samples into two parts, from the treatment (SST) and from the error (SSE). This 
partition gives the fundamental relationship in ANOVA, where total variation is divided into two 
portions: between-sample variation and within-sample variation. That is, 


Total SS = SST + SSE. 


The following derivations will make computation of these quantities simpler. The total SS can be 


written as 
kon; kon; kon; 
_\2 = = 
Towa 88 = 9-9 (0 9)? = 99-29 vy NY 
i=] j=1 =1 J=1 i=] j=1 
k ny 
Note that y = —*;—,, and then we have 


nj 


k 
Total SS = Y° Y° yz;- CM 
i=1 j=1 
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where CM is the correction factor for the correction for the means and is given by 


Let 
nj 
T; = > yij, be the sum of all the observations in the ith sample 
j=l 


and 


, the mean of the observations in the ith sample. 
nj 


We can rewrite y as 


k 
kon os 
ye do aiT; 

—_ f=1j=1 _ i=l 
rs rs 


Now, we introduce SST, the sum of squares for treatment (sometimes known as between group sum 
of squares, SSB) by 


k 
SST = "nj (F-3)*. 
i= 


We note that (7;) is the mean response due to its ith treatment and y is the overall mean. A large 
value of (7; — y) is likely to be caused by the ith treatment effect being much different from the rest. 
Hence SST can be used to measure the differences in the treatment effects. 


Thus, the sum of squares of errors (SSE) is 


SSE = Total SS — SST. 


We must state that the SSE is the sum of squares within groups (thus, sometimes SSE is referred to as 
within group sum of squares, SSW) and this can be seen from rewriting the expression as 


kon 
m2 
ss6= 3° (vy Ti)? 
(21 j=] 
The decomposition of total sum of squares can be easily seen in Figure 10.1. 


Figure 10.2 represents one point for each observation against each sample, with SM representing the 
sample means and GM representing the grand mean. The dotted line between SMs and GM is the 
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Total sum of 
squares 


SST (or between 
group sum of squares 


=LnilT- yy 


i=1 


SSE (or within group sum 
of squares 


Wi FIGURE 10.1 Decomposition of total SS. 


| 
i] 
7) = 
2 : = GM : 
ae : Qn 0 SM 
Z Deneen ner Ree 
= FSM 
= = 
7 : lll 
Sample 


Wi FIGURE 10.2 ANOVA decomposition. 


distance between them. Taking this distances, squaring, multiplying by the corresponding sample 
sizes, and summing, we get SST. To obtain SSE, we take the distance from each group mean, SM, to 
each member of the group, square them, and add them. In addition, to give an idea of within-group 
variations, it is customary to draw side-by-side box plots. 


As mentioned earlier, SST estimates the variation among the z's, and hence if all the jus were equal, 
the Ts would be similar and the SST would be small. It can be verified that the unbiased estimator 
of o* based on (n1 + nz +--+: +n —k) degrees of freedom is 

SSE 
(nj +2 +--+ +n —k) 


S? = MSE = 


SSE 


N-k 
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Note that the quantity MSE is a measure of variability within the groups. If there were only one group 
with n observations, then the MSE is nothing but the sample variance, s*. The fact that ANOVA deals 
simultaneously with all the k groups can be seen by rewriting MSE in the following form: 


(ny — 1) st + (nz — I) 55 +--+ (MR -— DY 82 
(ny —-1)+ (m2 -1)+---+(@,- 1) 


MSE = 


The mean square for treatments with (k — 1) degrees of freedom is 


SST 
MST = ——. 
k-1 
The MST is a measure of the variability between the sample means of the groups. We now summarize 
the analysis of variance hypothesis testing method for two or more populations. 


ONE-WAY ANALYSIS OF VARIANCE FOR k > 2 POPULATIONS 
We test 


Ho: @) =v2=...=MK versus 
Hg : At least two of the [is are different. 
When Hp is true, we have 


E(MST ) = E(MSE) 


The greater the differences among the j’s, the larger the E(MST) will be relative to E (MSE). 
Test statistic: 


rR MST 
~ MSE ° 
Rejection region is 
a a 


with v; = (k — 1) numerator degrees of freedom and v2 = Sar n; — k = N —k denominator degrees of 
freedom, where N = Ee nj. 
Assumptions: The observations Yis are assumed to be independent and normally distributed with mean 


j,i = 1,2, ...,k, and variance o2. 


Now we give a five-step computational procedure that we could follow for analysis of variance for 
the completely randomized design. 


ONE-WAY ANALYSIS OF VARIANCE PROCEDURE FOR k > 2 POPULATIONS 
We test 
oR — Versus 
Hq : At least two of the [is are different. 
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1. Compute 
nj k nj k nj 
1 = oT = doay, and Sy, 
i j=l f= ie 
k ni 2 
p» p» a ee & 
t= = = = yy Where N =) mi 
=1 
— Tf 
Tj = a 
nj 
and 


k nj 
ES = Yi = 0M: 
(—1 ja 


2. Compute the sum of squares between samples (treatments), 


k 72 
SST =) — CM 


and the sum of squares within samples, 


SSE = Total SS — SST 


Let 
SST 
MST = ——, 
S ae 
and 
SSE 
MSE = ——. 
—k 
3. Compute the test statistic: 
_ MST 
~ MSE” 


4. For a given a, find the rejection region as 


RR: F > Fa 
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with vy = (k = 1) numerator degrees of freedom and v2 = (ck n) — k =N —k denominator 


degrees of freedom, where N = ye nj. 


5. Conclusion: If the test statistic F falls in the rejection region, conclude that the sample evidence 
supports the alternative hypothesis that the means are indeed different for the k treatments and are 
not all equal. 


Assumptions: The samples are randomly selected from the k populations in an independent manner. The 
populations are assumed to be normally distributed with equal variances «2 and means /11,..., Mk. 


10.3.1 The p-Value Approach 


Note that if we are using statistical software packages, the p-value approach can be used for the 
testing. Just compare the p-value and a to arrive at a conclusion. Refer to the computer examples in 
Section 10.7. 


The following example illustrates the ANOVA procedure. 


———_e_eov—«v«—=—_<e—eaeOQQqQQ aoe 
Example 10.3.1 
The three random samples in Table 10.1 represent test scores from three classes of statistics taught by 
three different instructors and are independently obtained. Assume that the three different populations 
are normal with equal variances. 
At the a = 0.05 level of significance, test for equality of population means. 


Table 10.1 
Sample 1 Sample 2 Sample 3 
64 56 81 
84 74 92 
75 69 84 
77 
80 


Solution 
We test 


Ho : 41 = 2 = 3 versus Hq: At least two of the p's are different. 


Here, k = 3,n, =5,n2 =3, and N =n, +n2+n3 = 11. 


516 CHAPTER 10 Analysis of Variance 


Also, 


T; | 380] 199 | 257 
nj | 5 3 3 
Ty 


Clearly, the sample means are different. The question we are going to answer is: Is this difference due to just 
chance, or is it due to a real difference caused by different teaching styles? For this, we now compute the 


following: 


Hence, 


and 


The test statistic is 


Total SS = YY \ yj; CM 
iy 


= 64,558 — 63,536 = 1022 
SST =) Te CM 
; 
2 2 2 
_ » i = : = 
= 64,096.66 — 63,536 = 560.66 
SSE = Total SS — SST 


= 1022 — 560.66 = 461.34. 


CM 


SST 560.66 
MST = ——— = = 280.33, 
k-1 
ap SSE _ 461.34 | — 
~N=-k ia 
MST 280.33 
= 4.86. 


F=— = 
MSE 57.67 


From the F-table, Fo.95,2,8 = 4.46. 
Therefore, the rejection region is given by 


RR: F > 4.46. 


Decision: Because the observed value of F = 4.86 falls in the rejection region, we do reject Ho and conclude 
that there is sufficient evidence to indicate a difference in the true means. 
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If we want the p-value, we can see from the F-table that 0.025 < p-value < 0.05, indicating the rejection 
of the null hypothesis with a = 0.05. Using statistical software packages, we can get the exact p-value. 
= 


The calculations obtained in analyzing the total sum of squares into its components are usually 
summarized by the analysis-of-variance table (ANOVA table), given in Table 10.2. 


Sometimes, one may also add a column for the p-value, P(Fk—1.n—-~% > observed F), in the ANOVA 
table. 


For the previous example, we can summarize the computations by the ANOVA table shown in 
Table 10.3. 


10.3.2 Testing the Assumptions for One-Way ANOVA 


The randomness assumption could be tested using the Wald-Wolfowitz test (see Project 12B). The 
assumption of independence of the samples is hard to test without knowing how the data are collected 
and should be implemented during collection of data in the design stage. Normality can be tested 
(this should be performed separately for each sample, not for the total data set) using probability 
plots or other tests such as the chi-square goodness-of-fit-test. ANOVA is fairly robust against violation 
of this assumption if the sample sizes are equal. Also, if the sample sizes are fairly large, the central 
limit theorem helps. The presence of outliers is likely to increase the sample variance, thus decreasing 


Table 10.2 

Source of Degree of Sum of Mean F- 

variation freedom squares squares statistic 
kon 

Treatments k-1 SST = >> a —CM MST = $5 ut 
i=l ' 

Error n—k SSE = Total SS — SST MSE = SSE 
k nj 2 

Total n—k Total SS = > (yij —y) 
i=1i=1 

Table 10.3 


Source of Degree Sum of Mean_ F-statistic p-Value 
variation of freedom squares square 


Treatments 2 560.66 280.33 4.86 0.042 


Error 8 461.34 57.67 


Total 10 1022 
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the value of the F-statistic for ANOVA, which will result in a lower power of the test. Box plots or 
probability plots could be used to identify the outliers. If the normality test fails, transforming the 
data (see Section 14.4.2) or a nonparametric test such as the Kruskal-Wallis test described in Section 
12.5.1 may be more appropriate. If the sample sizes of each sample are equal, ANOVA is mostly 
robust for violation of homogeneity of the variances. A rule of thumb used for robustness for this 
condition is that the ratio of sample variance of the largest sample variance s? to the smallest sample 
variance s? should be no more than 3:1. Another popular rule of thumb used in one-way ANOVA 
to verify the requirement of equality of variances is that the largest sample standard deviation not 
be larger than two times the smallest sample standard deviation. Graphically, representing side-by- 
side box plots of the samples can also reveal lack of homogeneity of variances if some box plots are 
much longer than others (see Figure 10.3e). For a significance test on the homogeneity of variances 
(Levene’s test), refer to Section 14.4.3. If these tests reveal that the variances are different, then the 
populations are different, in spite of what ANOVA concludes about differences of the means. But this 
itself is significant, because it shows that the treatments had an effect. 


eS-S----,_-eoeorRoeoooe ——————————— oo 
Example 10.3.2 
In order to study the effect of automobile size on the noise pollution, the following data are randomly 
chosen from the air pollution data (source: A. Y. Lewin and M. F. Shakun, Policy Sciences: Methodology and 
Cases, Pergamon Press, 1976, p. 313). The automobiles are categorized as small, medium, large, and noise 
level reading (decibels) are given in Table 10.4. 


Table 10.4 
Size of automobile 


Small Medium Large 


820 840 785 

Noise level 820 825 775 
(decibels) 825 815 770 
835 855 760 

825 840 770 


At the w= 0.05 level of significance, test for equality of population mean noise levels for different sizes of 
the automobiles. Comment on the assumptions. 


Solution 

Let 4, 42, “3 be population mean noise levels for small, medium, and large automobiles, respectively. First 
we test for the assumptions. Using Minitab, run tests for each of the samples; we can justify the assumption 
of randomness of the sample values. A normality test for each column gives the graphs shown in Figures 
10.3a through 10.3c, through which we can reasonably assume the normality. Because the sample sizes are 
equal, we will use the one-way ANOVA method to analyze these data. 
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Figure 10.3d indicates that the relative positions of the sample means are different, and Figure 10.3e (Minitab 
steps for creating side-by-side box plots are given at the end of Example 10.7.1) gives an indication of within- 
group variations; perhaps the group 2 (medium-size) variance is larger. Now, we will do the analytic testing. 


Noise level for small size automobiles 


Probability 


Small 
Average: 825 Kolmogorov-Smirnov Normality Test 
Std Dev: 6.12372 D+:0.200 D—:0.149 D: 0.200 
N:5 Approximate P-Value > 0.15 


W@ FIGURE 10.3(a) Normal plot for noise level of small automobiles. 


Noise level for medium size automobiles 


Probability 


815 825 835 845 855 


Medium 
Average: 835 Kolmogorov-Smirnov Normality Test 
Std Dev: 15.4110 D+:0.142 D—:0.127 D: 0.142 
N:5 Approximate P-Value > 0.15 


Wi FIGURE 10.3(b) Normal plot for noise level of medium-sized automobiles. 
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Noise level for large size automobiles 


Probability 


760 770 780 
Large 
Average: 772 Kolmogorov-Smirnov Normality Test 
Std Dev: 9.08295 D+:0.171 D—:0.124 D: 0.171 
N:5 Approximate P-Value > 0.15 


Wi FIGURE 10.3(c) Normal plot for noise level of large automobiles. 


840 + 


830 5 


820 + 


810+ 


Mean 


800 + 


790 + 


780 + 


770 + 


1 2 3 
Sample 


W@ FIGURE 10.3(d) Mean decibel levels for three sizes of automobiles. 
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860 4 
850 7 
840 + 
820 + 

810- 
800 + 
790 + 
780 + 

770- 4 


760 + 


Decibels 


1 2 3 
Size of auto 


Wi FIGURE 10.3(e) Side-by-side box plots for decibel levels for three sizes of automobiles. 


We test 
Ho : 41 = 2 = 113 versus Hg : At least two of the y's are different. 


Here, k = 3,n, = 5,n2 =5,n3 =5andN =n, 4+n2+n3 = 15. 
Also 


T; | 4125 | 4175 | 3860 
ni| 5 5 5 
T; | 825 | 835 | 772 


In the following calculations, for convenience we will approximate all values to the nearest integer. 


2 
(FE) oases 
CM = ij _ (12,160) 


= 9,857,707 
N 15 
Total SS =). yj; - CM 
ij 
= 12,893 
T2 
sst=)\--—cmM 
ar: 
= 11,463 


SSE = Total SS — SST 
= 1430. 
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Hence, 
SST 11,463 
MST = —— = = 5732 
k-1 
and 
E 14 
MSE = Buccs = onan = 119. 
N-k 12 
The test statistic is 
MST = 5732 
= — = 48.10. 


F = 
MSE 119 


From the table, we get Fo.05,2,12 = 3.89. Because the test statistic falls in the rejection region, we reject 
at a = 0.05 the null hypothesis that the mean noise levels are the same. We conclude that size of the 
automobile does affect the mean noise level. 

= 


It should be noted that the alternative hypothesis H, in this section covers a wide range of situations, 
from the case where all but one of the population means are equal to the case where they are all 
different. Hence, with such an alternative, if the samples lead us to reject the null hypothesis, we are 
left with a lot of unsettled questions about the means of the k populations. These are called post hoc 
testing. This problem of multiple comparisons is the topic of Section 10.5. 


10.3.3 Model for One-Way ANOVA (Optional) 


We conclude this section by presenting the classical model for one-way ANOVA. Because the variables 
Y;; values are random samples from normal populations with E(Y;;) = 4; and with common variance 
Var(Y¥;j) = 07, fori=1,...,k and j= 1,...,”;, we can write a model as 


Vij = Mit ij, f= l,....ni 


where the error terms ¢;; are independent normally distributed random variables with E(¢;;) = 0 and 
Var(eij) = 07. Let aj = uw — pu; be the difference of 1; (ith population mean) from the grand mean j.. 
Then q; can be considered as the ith treatment effect. Note that the a; values are nonrandom. Because 
pb = >; (niwi/N), it follows that ar a; = 0. This will result in the following classical model for 
one-way layout: 


Yij = w+ aj + &;j, PS lyecayky JS disoa Ns 


With this representation, the test Hp : 4) ="42 = .... =x reduces to testing the null hypothesis that 
there is no treatment effect, Hp : a; = 0, fori =1,...,k. 


EXERCISES 10.3 


10.3.1. In an effort to investigate the premium charged by insurance companies for auto insur- 
ance, an agency randomly selects a few drivers who are insured by one of three different 
companies. These individuals have similar cars, driving records, and levels of coverage. 


10.3 Analysis of Variance for Completely Randomized Design 523 


Table 10.3.1 gives the premiums paid per 6 months by these drivers with these three 
companies. 


Table 10.3.1 
Company! Company Il Company Ill 


396 348 378 
438 360 330 
336 522 294 
318 474 432 


(a) Construct an analysis-of-variance table and interpret the results. 

(b) Using the 5% significance level, test the null hypothesis that the mean auto insurance 
premium paid per 6 months by all drivers insured for each of these companies is the 
same. Assume that the conditions of completely randomized design are met. 


10.3.2. Three classes in elementary statistics are taught by three different persons: a regular faculty 
member, a graduate teaching assistant, and an adjunct from outside the university. At the 
end of the semester, each student is given a standardized test. Five students are randomly 
picked from each of these classes, and their scores are as shown in Table 10.3.2. 


Table 10.3.2 

Faculty Teaching assistant Adjunct 
93 88 86 
61 90 56 
87 76 73 
75 82 90 
92 58 47 


(a) Construct an analysis-of-variance table and interpret your results. 

(b) Test at the 0.05 level whether there is a difference between the mean scores for the 
three persons teaching. Assume that the conditions of completely randomized design 
are met. 


10.3.3. Letny =n2o =... =ng =n’. Show that 
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10.3.4. For the sum of squares for treatment 


SST = dm (R=y) 
show that 


k 
E (SST) = (k— 1) 07 + ) nj (uj — 4)? 
i=1 


k 
where pp = x Yo nipi. 
i=1 
[This exercise shows that the expected value of SST increases as the differences among the 


us increase. | 


10.3.5. (a) Show that 
k nj 


SSE =) _ (nj -1)S? = 5 Yij Ey 


1 i=1 j=1 
where S? = =) iL, (Yj - Ti)” provides an independent, unbiased estimator for 
? in each of the k samples. 
(b) Show that SSE 7 o* has a chi-square distribution with N — k degrees of freedom, where 


k 
N= Doi 1. 
10.3.6. Let each observation in a set of k independent random samples be normally distributed 
with means j41,..., 4% and common variance o7. If Hy) = 1 = U2 =... = pg is true, 
show that 


_ SST/(k—1) _ MST 
~ SSE/(n—k) MSE 


has an F-distribution with k — 1 numerator and n — k denominator degrees of freedom. 


10.3.7. The management of a grocery store observes various employees for work productivity. 
Table 10.3.3 gives the number of customers served by each of its four checkout lanes per 
hour. 


Table 10.3.3 


Lane1 Lane2 Lane3_ Lane4 


16 11 8 21 
18 14 12 16 
22 10 17 17 
21 10 10 23 
15 14 13 17 


10 15 
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(a) Construct an analysis-of-variance table and interpret the results. Indicate any assump- 
tions that were necessary. 

(b) Test whether there is a difference between the mean number of customers served 
by the four employees at the 0.05 level. Assume that the conditions of completely 
randomized design are met. 


10.3.8. Table 10.3.4 represents immunoglobulin levels (with each observation being the IgA 
immunoglobulin level measured in international units) of children under 10 years of age 
of a particular group. The children are grouped as follows: A: ages 1 to less than 3, B: ages 
3 to less than 6, C: ages 6 to less than 8, and D: ages 8 to less than 10. Test whether there 
is a difference between the means for each of the age groups. Use a = 0.05. Interpret your 
results and state any assumptions that were necessary to solve the problem. 


Table 10.3.4 


A 35 8 12.19 56 64 75 25 


B 31 79 60 45 39 44 45 62 20 66 


C 74 56 77 35 95 81 28 


D 80 42 48 69 95 40 86 79 51 


10.3.9. Table 10.3.5 gives rental and homeowner vacancy rates by U.S. region (source: U.S. Census 
Bureau) for 5 years. 


Table 10.3.5 
Rental units 1995 1996 1997 1998 1999 


Northeast 7.2 74 6.7 6.7 6.3 
Midwest 7.2 19 8.0 79 8.6 
South 8.3 8.6 9.1 9.6 10.3 
West 75 7.2 6.6 6.7 6.2 


Test at the 0.01 level whether the true rental and homeowner vacancy rates by area are the 
same for all 5 years. Interpret your results and state any assumptions that were necessary 
to perform the analysis. 


10.3.10. Table 10.3.6 gives lower limits of income (approximated to the nearest $1000 and calculated 
as of March of the following year) of the top 5% of U.S. households by race from 1994 to 
1998 (Source: U.S. Census Bureau). 
Test at the 0.05 level whether the true lower limits of income for the top 5% of U.S. 
households for each race are the same for all 5 years. 
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Table 10.3.6 


Race Year 
1994 1995 1996 1997 1998 
All Races 110 113 120 127 132 


White 113 117 123 130 136 
Black 81 80 85 87 94 
Hispanic 82 80 86 93 98 


10.3.11. Table 10.3.7 gives mean serum cholesterol levels (given in milligrams per deciliter) by race 
and age in the United States between 1978 and 1980 (source: “Report of the National 
Cholesterol Education Program Expert Panel on Detection, Evaluation, and Treatment of 
High Blood Cholesterol in Adults,” Arch. Intern. Med. 148, January 1988). 


Table 10.3.7 
Race Age 

20-24 25-34 35-44 45-54 55-64 65-74 
AllRaces 180 199 217 227 229 221 
White 180 199 217 227 230 222 
Black 171 199 218 229 223 217 


Test at the 0.01 level whether the true mean cholesterol levels for all races in the United 
States between 1978 and 1980 are the same. 


10.4 TWO-WAY ANALYSIS OF VARIANCE, RANDOMIZED COMPLETE 
BLOCK DESIGN 


A randomized block design, or the two-way analysis of variance, consists of b blocks of k experimental 
units each. In many cases we may be required to measure response at combinations of levels of two 
or more factors considered simultaneously. For example, we might be interested in gas mileage per 
gallon among four different makes of cars for both in-city and highway driving, or to examine weight 
loss comparing five different diet programs among whites, African Americans, Hispanics, and Asians 
according to their gender. In studies involving various factors, the effect of each factor on the response 
variable may be analyzed using one-way classification. However, such an analysis will not be efficient 
with respect to time, effort, and cost. Also, such a procedure would give no knowledge about the likely 
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interactions that may exist among different factors. In such cases, the two-way analysis of variance is 
an appropriate statistical method to use. 


In a randomized block design, the treatments are randomly assigned to the units in each block, 
with each treatment appearing exactly once in every block (that is, there is no interaction between 
factors). Thus, the total number of observations obtained in a randomized block design is n = bk. The 
purpose of subdividing experiments into blocks is to eliminate as much variability as possible, that 
is, to reduce the experimental error or the variability due to extraneous causes. Refer to Section 9.2.3 
for a procedure to obtain completely randomized block design. The goal of such an experiment is to 
test the equality of levels for the treatment effect. Sometimes, it may also be of interest to test for a 
difference among blocks. We proceed to give a formal statistical model for the completely randomized 
block design. 


Fori = 1,2,...,k and j = 1,2,...,b, let ¥jj = w+ a; + Bj + 6, where Yj; is the observation 
on treatment i in block j, w is the overall mean, a; is the nonrandom effect of treatment i, f; is 
the nonrandom effect of block j, and ¢;; are the random error terms such that ¢;; are independent 
normally distributed random variables with E (¢;;) = 0 and Var (¢;;) = 07. In this case, )> a; = 0, 


and >> 6; = 0. 


The analysis of variance for a randomized block design proceeds similarly to that for a completely 
randomized design, the main difference being that the total sum of squares of deviations of the 
response measurements from their means may be partitioned into three parts: the sum of squares of 
blocks (SSB), treatments (SST), and error (SSE). 


Let Bj = )t_, yij and B; denote, respectively, the total sum and mean of all observations in block 
j. Represent the total for all observations receiving treatment i by 7; = ea yjj, and mean and T;, 
respectively. Let 


y = average of n = bk observations 
n 
j=li=l1 
and 


1 
CM = — (total of all observations)* 
n 


2 


j=liel 


For convenience, we can represent the two-way classification as in Table 10.5. 


b k b 2 
Note that from the table we can obtain }° 7 yij = 0 Bj. Hence, CM = (1/n) (i Bj) . 
jali=l j=l 
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Table 10.5 
Blocks 
1 2... fj... 6b Total T; Mean T; 

Treatment 1 yu y12 dats yj see Vib Ty Ti 
Treatment 2 y21 y22 jane y2j yas 2b T> T2 
Treatment i Vil yi2 Siete Vij oe Vib T; Ti 
Treatment k Vk Vk2 ion Ykj oe Vkb Tr Tk 
Total B; By By acne B; er Bp 

Mean B; By By oan Bj eee Bp y 


Then fora randomized block design with b blocks and k treatments, we need to compute the following 
sums of squares. They are 


Total SS = SSB + SST + SSE 


j=li=l jalil1 
b 
ya 
2_ i= 
SSB =k _ = = 
Ds (Bj -Y) : CM 
j=l 
and 
k 
k Li 
Sst =b>-(T%-y) = =— - cm 
i=1 
SSE = Total SS — SSB — SST. 
We define 
B 
MSB= a. 
SST 
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Table 10.6 
Source df. SS MS 
B 
Blocks b-1 SSB eee 
b-1 
T 
Treatments k-1 SST eu 
k-1 
SSE 
Error b—-1)(k-1 a 
ea aa SSE n—b—-—k+1 
Total n—-1 Total SS 


and 

jie PE: 
n-b-k+1 
The analysis of variance for the randomized block design is presented in Table 10.6. The column 
corresponding to d.f. represents the degrees of freedom associated with each sum of squares. MS 
denotes the mean square. 


To test the null hypothesis that there is no difference in treatment means, that is, to test 


Ho :a; =0, i=1,...,k versus Hg : Not all ais are Zero 
we use the F-statistic 
MST 
F=—— 
MSE 


and reject Ho if F > F, based on (k — 1) numerator and (n — b—k + 1) denominator degrees of 
freedom. 


Although blocking lowers the experimental error, it also furnishes a chance to see whether evidence 


exists to indicate a difference in the mean response for blocks. In this case we will be testing the 
hypothesis 


Ho: 6; =90, j=1,...,b versus Ha : Not all fi,s are zero. 


Under the assumption that there is no difference in the mean response for blocks, MSB provides an 
unbiased estimator for o* based on (b — 1) degrees of freedom. If there is a real difference that exists 
among block means, MSB will be larger in comparison with MSE and 


MSB 
F=—— 
MSE 


will be used as a test statistic. The rejection region will be if F > Fy based on (b — 1) numerator and 
(n — b—k +1) denominator degrees of freedom. 
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We now summarize the foregoing methodology in a step-by-step computational procedure. For a 
reasonable data size, we could use scientific calculators for handling the ANOVA calculations. For 
larger data sets, the use of statistical software packages is recommended. 


COMPUTATIONAL PROCEDURE FOR RANDOMIZED BLOCK DESIGN 
1. Calculate the following quantities: 
(i) Sum the observations for each row to form row totals: 


b 
Ty, 72, ..., Tk, where Tj = wir 
ja 


(ii) Sum the observations for each column to form column totals: 


k 
By,Bo, ...,Bp, where Bj = >> yy. 
= 


(iii) Find the sum of all observations: 


2. Calculate the following quantities: 
(i) Square the sum of the totals for each column and divide it by n = bk to obtain 


1(2 
=- 2 
CM D8 


(ii) Find the sum of squares of the totals of each column and divide it by k to obtain 


1 b 
2 
roa 
j=l 
and 
b 
> B? 
j=1 SSB 
ssp =" =CM and) MsB = ==, 
k b—1 
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and 
T2 
> J SST 
SS = —CM and MSB = ——. 
b k-1 
(iv) Find the sum of squares of individual observations: 
b ok 
BOLE 
j— (i 
Also compute 
b ok 
Total SS = )° yz — CM. 
y— i 
(v) Using (ii), (iii), and (iv), find 
SSE 
SSE = Total SS —SSB—SST and MSE = ————__. 
n—b-—k+1 


3. To test the null hypothesis that there is no difference in treatment means: 
(i) Compute the F-statistic, 


MST 
f=. 
MSE 


(ii) From the F-table, find the value of Fy, v,,v., where v; = (k — 1) is the numerator and 
v2 = (n — b —k + 1) the denominator degrees of freedom. 
(iii) Decision: Reject Ho if F > Fa, »,, 1. and conclude that there is evidence to conclude that there 
is a difference in treatment means at level a. 
4. To test the null hypothesis that there is no difference in the mean response for blocks, 
(i) Compute the F-statistic, 


M 
ae 
MSE 


(ii) From the F-table, find the value of Fy, »,, 42, where v; = (b — 1) is the numerator and 
v2 = (n — b —k + 1) the denominator degrees of freedom. 
(iii) Decision: Reject Ho if F > Fa,1,, 1. and conclude that there is evidence to conclude there is a 


difference in the mean response for blocks at level a. 
Assumptions: The samples are randomly selected in an independent manner from n = bk populations. 


The populations are assumed to be normally distributed with equal variances o2. Also, there are no 
interactions between the variables (two factors). 
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We have already discussed the assumptions and how to verify those assumptions in one-way analysis. 
The only new assumption in the randomized blocked design is about the interactions. One of the 
ways to verify the assumption of no interaction is to plot the observed values against the sample 
number. If there is no interaction, the line segments (one for each block) will be parallel or nearly 
parallel; see Figure 9.2. If the lines are not approximately parallel, then there is likely to be inter- 
action between blocks and treatments. In the presence of interactions, the analysis of this section 
need to be modified. For details on those procedures, refer to more specialized books on ANOVA 
methods. 


We illustrate the randomized block design procedure with the following example. 


ooo, 


Example 10.4.1 

A furniture company wants to know whether there are differences in stain resistance among the four 
chemicals used to treat three different fabrics. Table 10.7 shows the yields on resistance to stain (a low value 
indicates good stain resistance). 

At the a = 0.05 level of significance, is there evidence to conclude that there is a difference in mean 
resistance among the four chemicals? Is there any difference in the mean resistance among the materials? 
Give bounds for the p-values in each case. 


Table 10.7 
Chemical Material 

I ll Il Total 
Ci 3 7 6 16 


C3 2 5 7 14 
Ca 7 9 8 24 
Total 21. 32 «29-~—=O8 


Solution 
Here T; = 16,T2 = 28,73 = 14, and Tg = 24. Also, By = 21, Bz = 32, and B3 = 29. In addition, 
b=3,k =4, andn = bk = 12. Now 


2 


1 1 
CM =— B;| = — (82)* = 560.3333. 
n X J 1 ‘ ) 
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We can compute the following quantities: 


b 
dB 
j=l 2306 
SSB = E CM = Z 560.3333 = 16.1667, 
SSB 16.1667 
MSB = —— = ——— = 8.0834, 
b-1 
yg 
T; 
x : 1812 
SST = b CM = 3 560.3333 = 43.6667, 
and 
SST 43.6667 
MST = —~— = ——— = 14.5556. 
k-1 3 
bok 
We have )> >> y?, = 632. From this 
jeliet : 
bok 
— 2 = = = 
Total SS = ) > evr CM = 632 — 560.3333 = 71.666 
J=lI51 
SSE = Total SS — SSB — SST = 71.6667 — 16.1667 — 43.6667 
= 11.8333 
and 
E 11. 
MSE = as = ee = 1.9722. 
n—b—-k+1 6 


The F-statistic is 


MST _ 14.5556 


F=—— = —— =7.3804 
MSE 1.9722 


From the F-table, Fo.95,3,6 = 4.76. Because the observed value F = 7.3804 > 4.76, we reject the null 
hypothesis and conclude that there is a difference in mean resistance among the four chemicals. Because 
the F-value falls between a = 0.025 and a = 0.01, the p-value falls between 0.01 and 0.025. 

To test for the difference in the mean resistance among the materials, 


MSB _ 8.0834 
MSE 1.9722 


F= = 4.0987. 
From the F-table, Fo.95,2,6 = 5.14. Because the observed value of F = 4.098 < 5.14, we conclude that 
there is no difference in the mean resistance among the materials. Because the F-value falls between a = 0.10 
and 0.05, the p-value falls between 0.05 and 0.10. 

= 
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EXERCISES 10.4 
10.4.1. Show that 


jali=1 i=17=1 
a 2 2 = 2 
HOY -W +k L H-9 
i=1 j=) 


[Hint: Use the identity y;; — y =(yij — T; — Bj — y) + (Ti — y) + (Bj -Y).] 


10.4.2. Show the following: 
(a) E(MSE) = o?, 


b 
(b) E(MSB) = > x Bi +07, 
J= 
k 
(c) E(MST) = 72; SP +0? 


ll 
an 


10.4.3. The least-square estimators of the parameters j1, 1;'s, and 6;’s are obtained by minimizing 
the sum of squares 


i=1 j=1 
k b 
with respect to yu, 7;'s, and §;’s; subject to the restrictions: > j= > B; = 0. Show that the 
resultant estimators are 1 j=l 
h=y, 
Gea HH 12h 


and 


10.4.4. In order to test the wear on four hyperalloys, a test piece of each alloy was extracted from 
each of the three positions of a test machine. The reduction of weight in milligrams due to 
wear was determined on each piece, and the data are given in Table 10.4.1. 
At a = 0.05, test the following hypotheses, regarding the positions as blocks: 
(a) There is no difference in average wear for each material. 
(b) There is no difference in average wear for each position. 
(c) Interpret your final result and state any assumptions that were necessary to solve the 

problem. 
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Table 10.4.1 Loss in Weights 
Due to Wear Testing of Four 
Materials (in mg) 


Position 


Type of alloy 1 2 3 


1 241 270 =274 
2 195 241 218 
3 235 273 = 230 
4 234 236 = 227 


10.4.5. For the data of Exercise 10.3.10, test at the 0.05 level that the true income lower limits of 
the top 5% of U.S. households for each race are the same for all 5 years. Also, test at the 
0.05 level that the true income lower limits of the top 5% of U.S. households for each year 
between 1994 and 1998 are the same. 


10.4.6. For the data of Exercise 10.3.11, test at the 0.01 level that the true mean cholesterol levels 
for all races in the United States during 1978-1980 are the same. Also, test at the 0.01 level 
that the true mean cholesterol levels for all ages in the United States during 1978-1980 are 
the same. 


10.4.7. In order to see the effect of hours of sleep on tests of different skill categories (vocabulary, 
reasoning, and arithmetic), tests consisting of 20 questions each in each category were given 
to 16 students, four each based on the hours of sleep they had on the previous night. Each 
right answer is given one point. Table 10.4.2 gives the cumulative scores of the each of the 
four students in each category. 


Table 10.4.2 
Hours of sleep Category 


Vocabulary Reasoning Arithmetic 


0 44 33 35 
4 54 38 18 
6 48 42 43 
8 55 52 50 


Test at the 0.05 level whether the true mean performance for different hours of sleep is the 
same. Also, test at the 0.05 level whether the true mean performance for each category of 
the test is the same. 
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10.5 MULTIPLE COMPARISONS 


The analysis of variance procedures that we have used so far showed whether differences among several 
means are significant. However, if the equality of means is rejected, the F-test did not pinpoint for us 
which of the given means or group of means differs significantly from another given mean or group 
of means. With ANOVA, when the null hypothesis of equality of means is rejected, the problem is to 
see whether there is some way to follow up (post hoc) this initial test Hp : #1 = “2 =... = be by 
looking at subhypotheses, such as Ho : 1 = [2. 


This involves multiple tests. However, the solution is not to use a simple t-test repeatedly for every 
possible combination taken two at a time. That, apart from introducing many tests, will considerably 
increase the significance level, the probability of type I error. For example, to test four samples we 
will need (5) = 6 tests. If each one of the comparisons is tested with the same value of a = P (type 
I error), and if all the null hypotheses involving six comparisons are true, then the probability of 
rejecting at least one of them is 


P(at least one type I error) = 1 — (1 — a)°. 


In particular, if 7 = 0.01, then P(at least one type I error) = 0.077181, which is significantly higher 
than the original error value of 0.01. 


One way to investigate the problem is to use a multiple comparison procedure. A good deal of work 
has been done on problems of multiple comparisons. There are a variety of techniques available 
in the literature, such as the Bonferroni procedure, Tukey’s method, and Scheffe’s method. We now 
describe one of the more popular procedures called Tukey's method for completely randomized, one 
factor design. 


In this multiple comparison problem, we would like to test Ho : uj = j4j versus Hy : wi A j4;, for all 
i # j. Tukey’s method will be used to test all possible differences of means to decide whether at least 
one of the differences jz; — w; is considerably different from zero. In this comparison problem, Tukey's 
method makes use of confidence intervals for 4; — w;. If each confidence interval has a confidence 
level 1 — w, then the probability that all confidence intervals include their respective parameters is 
less than 1 — a. We now describe this method where each of the k sample means is based on the 
common number of observations, n 


Let N = kn be the total number of observations and let 


k nj=n 
eS > (Yj Tj)" 
a 1 j=1 
Let Tmax = max (7},..., 7) and Tmin = min (7), ..., T;). Define the random variable 
G= Tmax a Tie 
S/n ; 
The distribution of Q under the null hypothesis Ho : uw, = ... = x is called the Studentized range 


distribution, which depends on the number of samples k and the degrees of freedom v = N—k = 
(n — 1)k. We denote the upper a critical value by qa.x,v. The Studentized range distribution table gives 
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values for selected values of k, v, and a = 0.01, 0.05, and 0.10. The following theorem, due to Tukey, 
defines the test procedure. 


Theorem 10.5.1 Let T;, i = 1,2,...,k be the k sample means in a completely randomized design. Let 
i,t = 1,2,...,k be the true means and let nj = n be the common sample size. Then the probability that 
all (5) differences 4; — jx; will simultaneously satisfy the inequalities 


ae — S 
(7 T) dak eS Hi Mj S(T —T)) + dakw Fe 


is (1 — @), where qu,k,v is the upper a critical value of the Studentized range distribution. If, for a given i and 
j, Zero is not contained in the preceding inequality, Ho : wi = 4; can be rejected in favor of Ha: hi F (ej, 
at the significance level of a. 


Now we give a step-by-step approach to implementing Tukey’s method discussed earlier. 


PROCEDURE TO FIND (1-a~)100% CONFIDENCE INTERVALS FOR DIFFERENCE OF MEANS WITH 
COMMON SAMPLE SIZE N: TUKEY’S METHOD 

1. There are (5) comparisons of j1; versus /1j. 

2. Compute the following quantities: 


and 


kann 


1 
= ee wi Vij =p , Where N = kn. 
= f= 


3. From the Studentized range distribution table, find the upper @ critical value, dy, x, y, where 
v =N—k =(n—1)k. 
4. For each of (K ) pair (i,j), i 4 j, compute the Tukey’s interval 


(Ti — Tj) ~ da k,u qr (Ti - Tj) + 9a,k, Gq) - 


5. Let NR denote insufficient evidence for rejecting Ho. Create the following table for each of ©) 
pairwise difference jxj — jj, # j, and do not reject if the Tukey interval contains the number 0. 
Otherwise reject. 


Table 10.8 is used to summarize the final calculations of the Tukey method. 


In practice, there are now numerous statistical packages available for Tukey’s purpose. The following 
example is solved using Minitab. The necessary Minitab commands are given in Example 10.7.3. 
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Table 10.8 
i, el — ie 
Hi 2 T, — Ty 


Tukey interval 


Observation 


Doesn't contain 0 


Conclusion 


Reject 


M1 — 3 T, — T3 


Contains 0 


Do not reject 


$$ 


Example 10.5.1 


Table 10.9 shows the 1-year percentage total return of the top five stock funds for five different categories 
(source: Money, July 2000). Which categories have similar top returns and which are different? Use 95% 


Tukey's confidence intervals. 


Solution 


Table 10.9 
Large-cap Mid-cap Small-cap Hybrid Specialty 
110.1 299.8 153.8 68.3 181.6 
102.9 139.0 139.8 67.1 159.3 
93.1 131.2 138.3 42.5 138.3 
83.0 110.5 121.4 40.0 132.6 
83.3 129.2 135.9 41.0 135.7 


For simplicity of computation, we will use SPSS (Minitab steps are given in Example 10.7.2). The following is 


the output. 
One-way 


ANOVA 
RETURN 
Sum of df Mean Square F Sig. 
Squares 
Between Groups 41243.698 4 10310.925 7.397  .001 
Within Groups 27877.580 20 1393.879 
Total 69121.278 24 
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Post Hoc Tests 


Multiple Comparisons 


Dependent Variable: RETURN 


Tukey HSD 
(l) FUND = (J) FUND Mean Std. Error — Sig. 95% Confidence Interval 
Difference Lower Bound Upper Bound 
(I-J) 
1.00 2.00 —67.4600 23.61253  .066 —138.1175 3.1975 
3.00 —43.3600 23.61253 = 382 —114.0175 27.2975 
4.00 42.7000 23.61253 = .396 —27.9575 113.3575 
5.00 —55.0200 23.61253 177 —125.6775 15.6375 
2.00 1.00 67.4600 23.61253  .066 —3.1975 138.1175 
3.00 24.1000 23.61253 = .843 —46.5575 94.7575 
4.00 110.1600* 23.61253 ~~ .001 39.5025 180.8175 
5.00 12.4400 23.61253 = .984 —58.2175 83.0975 
3.00 1.00 43.3600 23.61253 = .382 —27.2975 114.0175 
2.00 —24.1000 23.61253 = -.843 —94.7575 46.5575 
4.00 86.0600* 23.61253 = .012 15.4025 156.7175 
5.00 —11.6600 23.61253 = .987 —82.3175 58.9975 
4.00 1.00 —42.7000 23.61253 = .396 —113.3575 27.9575 
2.00 —110.1600* 23.61253 001 —180.8175 —39.5025 
3.00 —86.0600* 23.61253 = .012 —156.7175 —15.4025 
5.00 —97.7200* 23.61253 004 —168.3775 —27.0625 
5.00 1.00 55.0200 23.61253 177 —15.6375 125.6775 
2.00 — 12.4400 23.61253 = .984 —83.0975 582175 
3.00 11.6600 23.61253 = .987 —58.9975 82.3175 
4.00 97.7200* 23.61253 = .004 27.0625 168.3775 


* The mean difference is significant at the .05 level. 
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Homogeneous Subsets 


RETURN 

Tukey HSD? 
Subset for alpha = .05 

FUND N 1 2 
4.00 5 51.7800 
1.00 5 94.4800 94.4800 
3.00 5 137.8400 
5.00 5 149.5000 
2.00 5 161.9400 
Sig. 396 066 


Means for groups in homogeneous subsets are displayed. 
4 Uses Harmonic Mean Sample Size = 5.000. 


The Tukey intervals for pairwise differences (4; — j4j) are in the foregoing computer printout. For example, 
the Tukey interval for (441 — 2) is (—138.1, 3.2) and for (2 — pg) is (39.5, 180.8). Also, sample mean 
and standard deviation are given in the output. For example, 94.48 is the sample mean of the five data 
points of large-cap funds, and 11.97 is the sample standard deviation of the five data points of large-cap 
funds. 


If the Tukey interval for a particular difference (wu; — i) contains the number zero, we do not reject the Hg : 
Hj = Lj. Otherwise, we reject the Ho : 4j = “j. For example the interval for (4 — (42) is (39.5 — 180.8) 
and does not contain zero. Hence we reject Ho : 4 = 2. 


The complete table corresponding to step 5 is produced in Table 10.10, where N.R. represents “not reject.” 


Table 10.10 

Mi- Bj T, -T; Tukey interval Reject or N.R. Conclusion 
Mi-pw2 161.94—94.48 — (—138.1, 3.2) N.R. Mi =p 
wi-m3  137.84—-94.48  (—114.0, 27.3) N.R. M1 = M3 
2-3 ~137.84—161.94  (—46.6, 94.8) NLR. M3 = br 
bi—Ma «51.78 - 94.48 — (27.9, 113.3) N.R. a= ih 
ba —fla «51.78 — 161.94 (39.5, 180.8) R ia Zits 
u3—pb4 51.78 — 137.84 — (15.4, 156.7) R ba #3 
wi-pfs  149.50—-94.98 (125.6, 15.6) N.R. ie= tia 
w2—bs 149.50— 161.94 — (—58.2, 83.1) NLR. bs = b2 
u3—bs 149.50— 137.84  (—82.3, 59.0) NLR. jigs 
la—ps  149.50—51.78 (—168.3, —27.1) R ee is 


10.5 Multiple Comparisons 541 


Based on the 95% Tukey intervals, the average top return of hybrid funds is different from those for mid-cap, 
small-cap, and specialty funds. All other returns are similar. 
= 


In Tukey's method, the confidence coefficient for the set of all pairwise comparisons {j; — jj} is 
exactly equal to 1 — a when all sample sizes are equal. For unequal sample sizes, the confidence 
coefficient is greater than 1 — a. In this sense, Tukey’s procedure is conservative when the sample 
sizes are not equal. In the case of unequal sample sizes, one has to estimate the standard deviation 
for each pairwise comparison. Tukey's procedure for unequal sample sizes is sometimes referred to 
as the Tukey—Kramer method. 


EXERCISES 10.5 


10.5.1. A large insurance company wants to determine whether there is a difference in the average 
time to process claim forms among its four different processing facilities. The data in Table 
10.5.1 represent weekly average number of days to process a form over a period of 4 weeks. 


Table 10.5.1 
Facility 1 Facility 2  Facility3 Facility 4 


1.50 2.25 1.30 2.0 
0.9 1.85 2.75 1.5 
1.12 1.45 2.15 2.85 
1.95 2.15 1.55 1.15 


(a) Test whether there is a difference in the average processing times at the 0.05 level. 

(b) Test whether there is a difference, using Tukey’s method to find which facilities are 
different. 

(c) Interpret your results and state any assumptions you have made in solving the problem. 


10.5.2. Table 10.5.2 gives the rental vacancy rates by U.S. region (source: U.S. Census Bureau) for 
5 years. 


Table 10.5.2 
Rental units 1995 1996 1997 1998 1999 


Northeast 7.2 74 6.7 6.7 6.3 
Midwest 7.2 7.9 8.0 7.9 8.6 
South 8.3 8.6 9.1 9.6 10.3 


West 75 7.2 6.6 6.7 6.2 
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10.5.3. 


10.5.4. 


(a) Test at the 0.01 level whether the true rental vacancy rates by region are the same for all 
5 years. 
(b) If there is a difference, use Tukey's method to find which regions are different. 


Table 10.5.3 gives lower limits of income (approximated to nearest $1000 and calculated 
as of March of the following year) by race for the top 5% of U.S. households from 1994 to 
1998. (Source: U.S. Census Bureau.) 


Table 10.5.3 


Race 1994 1995 1996 1997 1998 
All Races 110 113 120 127 132 
White 113 117 123 130 136 
Black 81 80 85 87 94 
Hispanic 82 80 86 93 98 


(a) Test at the 0.05 level whether the true lower limits of income for the top 5% of U.S. 
households for each race are the same for all 5 years. 

(b) If there is a difference, use Tukey's method to find which is different. 

(c) Interpret your results and state any assumptions you have made in solving the problem. 


The data in Table 10.5.4 represent the mean serum cholesterol levels (given in milligrams 
per deciliter) by race and age in the United States from 1978 to 1980 (source: “Report of 
the National Cholesterol Education Program Expert Panel on Detection, Evaluation, and 
Treatment of High Blood Cholesterol in Adults,” Arch. Intern. Med. 148, Jan. 1988). 


Table 10.5.4 
Race Age 
20-24 25-34 35-44 45-54 55-64 65-74 
All races 180 199 217 227 229 221 
White 180 199 217 227 230 222 
Black 171 199 218 229 223 217 


(a) Test at the 0.01 level whether the true mean cholesterol levels for all races in the United 
States during 1978-1980 are the same. 

(b) If there is a difference, use Tukey’s method to find which of the races are different with 
respect to the mean cholesterol levels. 
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10.6 CHAPTER SUMMARY 


In this chapter, we have introduced the basic idea of analyzing various experimental designs. In 
Section 10.3, we explained the one-way analysis of variance for the hypothesis testing problem for 
more than two means (different treatments being applied, or different populations being sampled). 
The two-way analysis of variance, having b blocks and k treatments consisting of b blocks of k exper- 
imental units each, is discussed in Section 10.5. We also describe one popular procedure called 
Tukey's method for completely randomized, one-factor design for multiple comparisons. We saw in 
Chapter 9 that there are other possible designs, such as the Latin square design or Taguchi meth- 
ods. We refer to specialized books on experimental design (Hicks and Turner) for more details 
on how to conduct ANOVA on such designs. In the final section, we give some computational 
examples. 


We now list some of the key definitions introduced in this chapter: 


= Completely randomized experimental design 
m Randomized block design 

= Studentized range distribution 

= Tukey—Kramer method 


In this chapter, we also learned the following important concepts and procedures: 


= Analysis of variance procedure for two treatments 

m= One-way analysis of variance for k > 2 populations 

m= One-way analysis of variance procedure for k > 2 populations 

m Procedure to find (1 — @)100% confidence intervals for difference of means with common 
sample size n; Tukey's method 

= Computational procedure for randomized block design 


10.7 COMPUTER EXAMPLES 


Minitab, SPSS, SAS, and other statistical programming packages are especially useful when we perform 
an analysis of variance. As we have experienced in earlier sections, an ANOVA computation is very 
tedious to complete by hand. 


10.7.1 Minitab Examples 


——O-.:. eee: hn eee 
Example 10.7.1 
(One-way ANOVA): The three random samples in Table 10.11 are independently obtained from three 
different normal populations with equal variances. 
At the a = 0.05 level of significance, test for equality of means. 
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Table 10.11 

Sample 1 Sample 2 Sample 3 
64 56 81 
84 74 92 
75 69 84 
77 
80 


Solution 
Enter sample 1 data in C1, sample 2 in C2, and sample 3 in C3. 


Stat > ANOVA > One-way (unstacked). .. > in Responses (in separate columns): type C1 C2 C3 
and click OK 


We get the following output: 


One-Way Analysis of Variance 


Analysis of Variance 


Source DF SS MS F P 
Factor 2 560.7 280.3 4.84 0.042 
Error 8 463.3 579 

Total 10 1024.0 


Individual 95% CIs For Mean 
Based on Pooled StDev 


Level N Mean StDev aaa aa a ae pane eee a alia bos 
Cl 5 76.000 7.517 (Seeee aaa tial ) 
C2 3 66.333 9.292 (et reese Bele eee aie ) 
C3 3 85.667 5.686 Coops RRS E STS ) 
----- +--------- +-------- t--------- +-- 
Pooled StDev = 7.610 60 72 84 96 


We can see that the output contains, SS, MS, individual column means, and standard deviation values. Also, 
the F-value gives the value of the test statistic, and the p-value is obtained as 0.042. Comparing this p-value 
of 0.042 with a = 0.05, we will reject the null hypothesis. 


If we want to create side-by-side box plots to graphically test homogeneity of variances, we can do the 
following. 
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Enter all the data (from all three samples) in C1, and enter the sample identifier number in C2 (that is, 1 if 
the data belong to sample 1, 2 for sample 2, and 3 for sample 3). 


Graph > Boxplot > in Y column, type C7 and in X column, type C2 > click OK 


Then as in Example 10.3.2, interpret the resulting box plots. 


ree 


Example 10.7.2 
Give Minitab steps for randomized block design for the data of Example 10.4.1. 


Solution 

To put the data into the format for Minitab, place all the data values in one column (say, C2). Let numbers 
1, 2, 3, 4 represent the chemicals and numbers 1, 2, 3 represent the fabric material. In one column (say, C1) 
place numbers 1 through 4 with respect to the data values identifying the factor (chemical) used. In another 
column (say, C3) place corresponding numbers 1 through 3 to identify the second factor (material) used. See 


Table 10.12. 

Table 10.12 

C1 c2 c3 

chemical response material 
1 3 1 
2 9 1 
3 2 1 
4 7 1 
1 7 2 
2 11 2 
3 5 2 
4 9 2 
1 6 3 
2 8 3 
3 7 3 
4 8 3 


546 CHAPTER 10 Analysis of Variance 


Then do the following: 


Stat > ANOVA > Two-way... > in Response: type C2, in Row Factor: type C1, and in Column 
factor: type C3 > OK 


We will get the following output. 


Two-Way Analysis of Variance 


Analysis of Variance for Response 
Source DF SS MS F P 
Chemical 3 43.67 14.56 7.38 0.019 


Material 2 16.17 8.08 4.10 0.075 
Error 6 11.83 1.97 
Total 11 71.67 


Note that the output contains p-values for the effect both of the chemicals and of the materials. Because the 
p-value of 0.019 is less than a = 0.05, we reject the null hypothesis and conclude that there is a difference in 
mean resistance among the four chemicals. For the materials, the p-value of 0.075 is greater than a = 0.05, 
so we cannot reject the null hypothesis and conclude that there is no difference in the mean resistance 
among the materials. 

iiss) 


Rs 


Example 10.7.3 
Give the Minitab steps for using Tukey’s method for the data of Example 10.5.1. 


Solution 

In order to use Tukey's method, it is necessary to enter the data in a particular way. Enter all the data points 
in column C1; first five from large-cap, next five from mid-cap, and so on, with the last five from specialty. 
In column C2, enter the number identifying the data points; the first four numbers are 1 (identifying 1 as the 
data belonging to large-cap), next five numbers are 2, and so on; the last five numbers are 5. Then: 


Stat > ANOVA > One-way... > Comparisons... > click Tukey’s, family error rate: and type 5 (to 
represent 100@% error) > OK > in Response: type C7, and in Factor: type C2 > OK 


We will get the output similar to that given in the solution part of Example 10.5.1. For discussion of the 
output, refer to Example 10.5.1. 
| 


10.7.2 SPSS Examples 


3 


Example 10.7.4 
Conduct a one-way ANOVA for the data of Example 10.7.1. Use w = 0.05 level of significance, and test for 
equality of means. 
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Solution 

In SPSS, we need to enter the data in a special way. First name column C1 as Sample, and column C2 as 
Values. In the Sample column, enter the numbers to identify from which group the data comes. In this case, 
enter 7 in the first five rows, 2 in the next three rows, and 3 in the last three rows. In the Values column, 
enter sample 1 data in the first five rows, sample 2 data in the next five rows, and sample 3 data in the last 
three rows. Then: 


Analyze > Compare Means > One-way ANOVA. .. > Bring Values to Dependent List: and Sample 
to Factor: > OK 


We will get the following output. 


ANOVA VALUES 


Sum of Squares df Mean Square F Sig. 
Between Groups 560.667 2 280.333 4.840 042 
Within Groups 463.333 8 57.917 
Total 1024.000 10 


Because Sig. Value 0.042 is less than a = 0.05, we reject the null hypothesis. 


E_—_—$ SS >N 


Example 10.7.5 
Give the SPSS steps for using Tukey’s method for the data of Example 10.5.1. 


Solution 

First name column C1 as Fund and column C2 as Return. In the Fund column, enter the numbers to identify 
from which group the data comes. In this case, the first four numbers are 1 (identifying 1 as the data 
belonging to large-cap), the next four numbers are 2, and so on, until the last four numbers are 5. In the 
Return column, enter large-cap return data in the first four rows, mid-cap data in the next four rows, and so 
on; the last four from speciality. Then: 


Analyze > Compare Means > One-way ANOVA... > Bring Return to Dependent List: and Fund to 
Factor: > Click Post-Hoc. .. > click Tukey > click Continue > OK 


We will get the output as in Example 10.5.1. 
Interpretation of output is given in Example 10.5.1. When the treatment effects are significant, as in this 
example where the p-value is 0.001, the means must then be further examined to determine the nature 
of the effects. There are procedures called post hoc tests to assist the researcher in this task. For example, 
looking at the output column Sig., we could observe that there are significant differences in the mean returns 
between funds 2 and 4, and funds 4 and 5. 

|| 
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10.7.3 SAS Examples 


OO 
Example 10.7.6 
Using SAS, conduct a one-way ANOVA for the data of Example 10.7.1. Use a = 0.05 level of significance, 
and test for equality of means. 


Solution 
We could use the following code. 


Options nodate nonumber; 
options 1s=80 ps=50; 
DATA Scores; 

INPUT Sample Value @@; 
DATALINES; 

1 64 1 84 175 177 1 80 
2 56 2 74 2 69 

3 81 3 92 3 84 


PROC ANOVA DATA=Scores; 
TITLE ’ANOVA for Scores’; 
CLASS Sample; 
MODEL Value=Sample; 
MEANS Sample; 
RUN; 
We will get the following output: 


ANOVA for Scores 
The ANOVA Procedure 


Class Level Information 


Class Levels Values 
Sample 3 123 
Number of observations 11 


The ANOVA Procedure 
Dependent Variable: Value 


Sum of 
Source DF Squares Mean Square F Value Pr > F 


Model 2 560.666667 280.333333 4.84 0.0419 
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Error 8 463 .333333 57.916667 
Corrected 
Total 10 1024.000000 
R-Square Coeff Var Root MSE Value Mean 
0.547526 10.01355 7.610300 76.00000 
Source DF Anova SS Mean Square F Value Pr > F 
Sample 2 560.6666667 280. 3333333 4.84 0.0419 


The ANOVA Procedure 


Bevel Of: qq. SeiGiteem See Values +32 sss 2es" 
Sample N Mean Std Dev 
1 5 76.0000000 7.51664819 
2 3 66. 3333333 9.29157324 
3 3 85.6666667 5.68624070 


Because the p-value 0.0419 is less than a = 0.05, we reject the null hypothesis. 


We could have used PROC GLM instead of PROC ANOVA to perform the ANOVA procedure. Usually, PROC 
ANOVA is used when the sizes of the samples are equal; otherwise PROC GLM is more desirable. The next 
example will show how to do the multiple comparison using Tukey’s procedure. 

| 


oun 


Example 10.7.7 
Give the SAS commands for using Tukey’s method for the data of Example 10.5.1. 


Solution 
We could use the following code. 


Options nodate nonumber; 
options 1s=80 ps=50; 
DATA Mfundrtn; 

INPUT Fund Return @@; 


DATALINES; 

1110.1 2 299:..8 3 153.8 4 68.3 5 181.6 
1 102.9 2 139.0 3 139.8 4 67.1 5 159.3 
LO 3c. 2 131.2 3 138.3 4 42.5 5 138.3 
1 83.3 2 1292 3 135.9 4 41.0 5. 13567 
1 83.0 2 110.5 3121.4 4 40.0 5 132.6 
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PROC GLM DATA=Mfundrtn; 

TITLE ’ANOVA for Mutual fund returns’; 
CLASS Fund; 

MODEL Return=Fund; 

MEANS Fund / tukey; 

RUN; 


ANOVA for Mutual fund returns 
The GLM Procedure 


Class Level Information 


Class Levels Values 
Fund 5 12345 
Number of observations 25 


ANOVA for Mutual fund returns 
The GLM Procedure 


Dependent Variable: Return 


Sum of 
Source DF Squares Mean Square F Value Pr > F 
Model 4 41243 .69840 10310.92460 7.40 0.0008 
Error 20 =27877.58000 1393.87900 
Corrected Total 24 = 69121.27840 
R-Square Coeff Var Root MSE Return Mean 
0.596686 31.34524 37 .33469 119.1080 
Source DF Type I SS Mean Square F Value Pre > F 
Fund 4 41243 .69840 10310.92460 7.40 0.0008 
Source DF Type III SS Mean Square F Value Rie Dik 
Fund 4 41243.69840  10310.92460 7.40 0.0008 


ANOVA for Mutual fund returns 
The GLM Procedure 


Tukey’s Studentized Range (HSD) Test for Return 
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NOTE: This test controls the Type I experiment wise error rate, but it 
generally has a higher Type II error rate than REGWQ. 


Alpha 0.05 
Error Degrees of Freedom 20 
Error Mean Square 1393.879 
Critical Value of Studentized Range 4.23186 
Minimum Significant Difference 70.658 


Means with the same letter are not significantly different. 


Tukey Grouping Mean N Fund 
A 161.94 5 2 
A 
A 149.50 5 5 
A 
A 137.84 5 3 
A 
B A 94.48 5 1 
B 
B 51.78 5 4 


The GLM Procedure 
Tukey’s Studentized Range (HSD) Test for Value 


NOTE: This test controls the Type I experiment wise error rate, but it 
generally has a higher Type II error rate than REGWQ. 


Alpha 0.05 
Error Degrees of Freedom 20 

Error Mean Square 1393.879 
Critical Value of Studentized Range 4.23186 
Minimum Significant Difference 70.658 


Means with the same letter are not significantly different. 


Tukey Grouping Mean N Sample 
A 161.94 5 2 
A 149.50 5 5 
i 137.84 5 3 
B f 94.48 5 1 
B 51.78 5 4 
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Looking at the p-value of 0.008, which is less than a = 0.05, we conclude that there is a difference in mutual 
fund returns. 


In the previous example, we used the post hoc test Tukey. We could have used other options such as DUNCAN, 
SNK, LSD, and SCHEFFE. The test is performed at the default value of a = 0.05. If we want to specify, say, 
a = 0.01, or 0.1, we could have done so by using the command MEANS Fund / Tuckey ALPHA=0.01;. 


If we need all the confidence intervals in the Tukey method, in the code just given, we have to modify ‘MEANS 
Fund / Tukey,’ to ‘MEANS Fund / LSD TUKEY CLDIFF,’ which will result in the following output. 


ANOVA for Mutual fund returns 
The GLM Procedure 


Class Level Information 


Class levels Values 
Fund 5 12345 
Number of observations 25 


ANOVA for Mutual fund returns 
The GLM Procedure 


Dependent Variable: Return 


Sum of 
Source DF Squares Mean Square F Value Pr > F 
Model 4 41243.69840 10310.92460 7.40 0.0008 
Error 20 27877 .58000 1393.87900 


Corrected Total 24 69121.27840 


R-Square Coeff Var Root MSE Return Mean 
0.596686 31.34524 37 .33469 119.1080 
Source DF Type I SS Mean Square F Value Pr > F 
Fund 4 41243.69840 10310.92460 7.40 0.0008 
Source DF Type III SS Mean Square F Value Pir > F 


Fund 4 41243.69840  10310.92460 7.40 0.0008 
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ANOVA for Mutual fund returns 
The GLM Procedure 
t-tests (LSD) for Return 


NOTE: This test controls the Type I comparisonwise error rate, not the 
experiment wise error rate. 


Alpha 0.05 
Error Degrees of Freedom 20 
Error Mean Square 1393.879 
Critical Value of t 2.08596 
Least Significant Difference 49.255 


Comparisons significant at the 0.05 level are indicated by ***, 


Difference 


Fund Between 95% Confidence 
Comparison Means Limits 
2:2 5 12.44 -36.81 61.69 
2273 24.10 -25.15 73.35 
2-1 67.46 18.21 16.71 *** 
2- 4 110.16 60.91 59.41 *** 
ae -12.44 -61.69 36.81 
b= 8 11.66 -37.59 60.91 
b= 55.02 5.77 04.27 *** 
bes A 97.72 48.47 46:97. 2 es 
3-2 -24.10 -73..35 25.15 
3-45 -11.66 -60.91 37.59 
3 - 43.36 -5.89 92.6 
3 - 4 86.06 36.81 35.31 *** 
Lo 2 -67.46 -116.71 -18.21 *** 
1-5 -55.02 -104.27 <5 Lf eR E* 
1 = 3 -43.36 -92.61 5.89 
1-4 42.70 -6.55 91.95 
4-2 -110.16 -159.41 -60.91 *** 
4-5 -97.72 -146.97 -48.47 *** 
4 - 3 -86.06 -135.31 -36.81 *** 
Sq) -42.70 -91.95 6.55 


ANOVA for Mutual fund returns 
The GLM Procedure 
Tukey’s Studentized Range (HSD) Test for Return 


NOTE: This test controls the Type I experiment 
wise error rate. 


554 CHAPTER 10 Analysis of Variance 


Alpha 0.05 
Error Degrees of Freedom 20 
Error Mean Square 1393.879 
Critical Value of Studentized Range 4.23186 
Minimum Significant Difference 70.658 


Comparisons significant at the 0.05 level are 
indicated by ***, 


Difference 


Fund Between Simultaneous 95% 
Comparison Means Confidence Limits 
2-5 2.44 -58.22 83.10 
2- 3 24.10 -46.56 94.76 
ioe 67.46 -3.20 138.12 
2-4 110.16 39.50 180.82 *** 
5 > 2 -12.44 -83.10 58.22 
ae 1.66 -59.00 82.32 
B= dl 55.02 -15.64 125.68 
5 - 4 97.72 27.06 168.38 *** 
5. =" 30 -24.10 -94.76 46.56 
3-5 -11.66 -82.32 59.00 
Be ed 43.36 -27.30 114.02 
3.2 4 86.06 15.40 156.72 *** 
1-2 -67.46 -138.12 3.20 
L- 4 -55.02 -125.68 5.64 
ih, = 3 -43.36 -114.02 27.30 
1-4 42.70 -27.96 113.36 
4 - 2 -110.16 -180.82 -39.50 *** 
A= 25 -97.72 -168.38 -27.06 *** 
4 - 3 -86.06 -156.72 =-15.40 *** 
4 =] -42.70 -113.36 27.96 


EXERCISES 10.7 


10.7.1. For the data of Exercise 10.5.4, perform a one-way analysis of variance using any of the 
software (Minitab, SPSS, or SAS). 


10.7.2. For the data of Exercise 10.5.2, perform Tukey’s test using any of the software (Minitab, SPSS, 
or SAS). 


10.7.3. For the data of Exercise 10.5.4, perform Tukey’s test using any of the software (Minitab, SPSS, 
or SAS). 


PROJECTS FOR CHAPTER 10 


10A. Transformations 

The basic model for the analysis of variance requires that the independent observations come from 
normal populations with equal variances. These requirements are rarely met in practice, and the extent 
to which they are violated affects the validity of the subsequent inference. Therefore, it is important 
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for the investigator to decide whether the assumptions are at least approximately satisfied and, if not, 
what can be done to rectify the situation. Hence it is necessary to (a) examine the data for marked 
departures from the model and, if necessary, (b) apply an appropriate transformation to the data to 
bring it more in line with the basic assumptions. 


A simple way to check for the equality of the population variances is to calculate the sample variances 
and plot against mean as in Figure 10.3. If the graph suggests a relation between sample mean and 
variance, then the relation very likely exists between population mean and variance, and hence the 
population from which the samples are taken may very well be nonnormal. 


Ifa study of sample means and variances reveals a marked departure from the model, the observations 
may be transformed into a new set to which the methods of ANOVA are better suited. Three commonly 
used transformations are the following: 


(a) The logarithmic transformation: If the graph of sample means against sample variance suggests 
a relation of the form 


se =C (x") , 
replace each observation X by its logarithm to the base 10, 
Y = logi) X; 


or, if some X-values are zero, by Y = log,, (X + 1). 
(b) The square root transformation: If the relation is of the form 


replace X by its square root, 
Y=VX 


or, if the values of X are very close to zero, by the square root of (X + 1/2). This relation is 
found in data from Poisson populations, where the variance is equal to the mean. 

(c) The angular transformation: If the observations are counts of a binomial nature, and p is the 
observed proportion, replace p by 


@ = arcsin Vp ; 


which is the principal angle (in degrees or radians) whose sine is the square root of p. 

(i) To check for the equality of the population variances, calculate the sample variances 
for each of the data sets given in the exercises of Section 10.3 and plot against the 
corresponding mean. 

(ii) If there is assumptional violation, perform one of the transformations described earlier 
and do the analysis of variance procedure for the transformed data. 
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10B. Anova with Missing Observations 


In the two-way analysis of variance, we assumed that each block cell has one treatment value. However, 
it is possible that some observations in some block cells may be missing for various reasons, such as 
that the investigator failed to record the observations, the subject discontinued participation in the 
experiment, or the subject moved to a different place or died prior to completion of the experiment. 
In those cases, this project gives a method of inserting estimates of the missing values. 


Let y.. denote the total of all kb observations. If the observation corresponding to the ith row and the 
jth column, which is denoted by y;;., is missing, then all the sums of squares are calculated as before, 
except that the y;; term is replaced by 


: bBi + kT; —y'.. 
a == 

where T; denotes the total of b— 1 observations in the ith row, B’, denotes the total of k—1 observations 
in the jth column, and y’.. denotes the sum of all kb — 1 observations. Using calculus, one can show 
that };; minimizes the error sum of squares. One should not include these estimates when computing 
relevant degrees of freedom. With these changes, proceed to perform the analysis as in Section 10.4. 
For more details on the method, refer to Sahai and Ageel (2000), p. 145. 


Perform the test of Example 10.4.1, now with a missing value for material III and chemical C4. Does 
the conclusion change? 


10C. ANOVA in Linear Models 


In order to determine whether the multiple regression model introduced in Section 8.5 is ade- 
quate for predicting values of dependent variable y, one can use the analysis of variance F-test. The 
model is 


Y = Bo + Bix + Box2 +--+ + Byxe te, 


where ¢ = (£1, €2,..-,&) ~ N (0,07) and ¢; and ¢; are uncorrected if i # j. Define the multiple 
coefficient of determination, R?, by 


._ 4,)\2 
ga. 2 5i) 
Loi- yy 
The Analysis of Variance F-Test 
Ho: By = B2 =... = Be = O versus 


Hg : At least one of the parameters, 61, 62, ..., By, differs from 0. 
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Test statistic: 


_ Mean square for model 


Mean square for error 
___ SS (model) /k 
SSE/[n — (k + 1)] 
R?/k 
~ (1—R2) /[n— 4D] 


where 


n = number of observations 
k = number of parameters in the model excluding fp. 


From the F-table, determine the value of F, with k numerator d.f. and n — (k + 1) denominator d.f. 
Then the rejection region is {F > Fy}. 


If we reject the null hypothesis, then the model can be taken as useful in predicting values of y. 


For the data of Example 8.5.1, test the overall utility of the fitted model 
y = 66.12 — 0.3794X 1 + 21.4365X2 


using the F-test described earlier. 
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The Reverend Thomas Bayes (1702-1761) was a Nonconformist minister. In the 1720s Bayes started 
working on the theory of probability. Even though he did not publish any of his works on mathe- 
matics during his lifetime, Bayes was elected a Fellow of the Royal Society in 1742. His famous work 
titled “Essay toward solving a problem in the doctrine of chances” was published in the Philosophical 
Transactions of the Royal Society of London in 1764, after his death. The paper was sent to the Royal 
Society by Richard Price, a friend of Bayes. Another mathematical publication on asymptotic series 
also appeared after his death. 


11.1 INTRODUCTION 


Bayesian procedures are becoming increasingly popular in building statistical models for real-world 
problems. In recent years, the Bayesian statistical methods have been increasingly used in scientific 
fields ranging from archaeology to computing. Bayesian inference is a method of analysis that com- 
bines information collected from experimental data with the knowledge one has prior to performing 
the experiment. Bayesian and classical (frequentist) methods take basically different outlooks toward 
statistical inference. In this approach to statistics, the uncertainties are expressed in terms of proba- 
bilities. In the Bayesian approach, we combine any new information that is available with the prior 
information we have, to form the basis for the statistical procedure. The classical approach to statistical 
inference that we have studied so far is based on the random sample alone. That is, if a probabil- 
ity distribution depends on a set of parameters 0, the classical approach makes inferences about 6 
solely on the basis of a sample X1,..., X,. This approach to inference is based on the concept of 
a sampling distribution. To correctly interpret traditional inferential procedures, it is necessary to 
fully understand the notion of a sampling distribution. In this approach, we analyze only one set 
of sample values. However, we have to imagine what could happen if we drew a large number of 
random samples from the population. For example, consider a normal sample with known variance. 
We have seen that a 95% confidence interval for the population mean yz is given by the random 
interval (X — 1.960/,/n, X + 1.960/,/n). This means that when samples are repeatedly taken from 
the population, at least 95% of the random intervals contain the true mean ju. The classical inferential 
approach does not use any of the prior information we might have as a result of, say, our familiarity 
with the problem, or information from earlier studies. Scientists and engineers are faced with the 
problem that there is typically only a single data set, and they need to determine the value of the 
parameter at the time the data are taken. The basic question then is, “What is the best estimate of a 
parameter one can make from the data using one’s prior information?” Statistical approaches that use 
prior knowledge, possibly subjective, in addition to the sample evidence to estimate the population 
parameters are known as Bayesian methods. 


Bayesian statistics provides a natural method for updating uncertainty in the light of evidence. Data 
are still assumed to come from a distribution belonging to a known parametric family. However, 
the Bayesian outlook toward inference is founded on the subjective interpretation of probability. 
Subjective probability is a way of stating our belief in the validity of a random event. The following 
example will illustrate the idea. Suppose we are interested in the proportion of all undergraduate 
students at a particular university who take on out-of-campus jobs for at least 20 hours a week. 
Suppose we randomly select, say, 50 students from this university and obtain the proportion of 
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students who have out-of-campus jobs for at least 20 hours a week. Let us assume that the sample 
proportion is 30/50=0.6. In a frequentist approach, all of the inferential procedures, such as point 
estimation, interval estimation, or hypothesis testing, are based on the sampling distribution. 


That is, even though we are analyzing only one data set, it is necessary to have the knowledge of the 
mean, standard deviation, and shape of this sampling distribution of the proportion for the correct 
interpretation in classical inferential procedures. In the subjective interpretation of probability, the 
proportion of undergraduates who work on an out-of-campus job for at least 20 hours a week is 
assumed to be unknown and random. A probability distribution, called the prior, that represents 
our knowledge or belief about the location of this proportion before any data collected is used. 
For instance, the college placement office already may have an opinion on this proportion based 
on its earlier experience. The classical approach ignores this prior knowledge, whereas the Bayesian 
approach incorporates this knowledge with the current observed data to update the value of this 
proportion. That is, after the data are collected our opinion about the proportion may change. Using 
Bayes’ rule, we will compute the posterior probability distribution for the proportion, based on our 
prior belief and evidence from the data. All of our inferences about the proportion are made by 
computing appropriate statistics of the posterior distribution. 


The Bayesian approach seeks to optimally merge information from two sources: (1) knowledge that 
is known from theory or opinion formed at the beginning of the research in the form of a prior, 
and (2) information contained in the data in the form of likelihood functions. Basically, the prior 
distribution represents our initial belief, whereas the information in the data is expressed by the like- 
lihood function. Combining prior distribution and likelihood function, we can obtain the posterior 
distribution. This expresses our revised uncertainty in light of the data. The main difference between 
the Bayesian approach and the classical approach is that in the Bayesian setting, the parameter is 
viewed as random variables, whereas the classical approach considers the parameter to be fixed but 
unknown. The parameter is random in the sense that we can assign to it a subjective probability 
distribution that describes our confidence about the actual value of the parameter. 


Some of the reasons for Bayesian approaches are as follows: (1) Most Bayesian inferential conclu- 
sions are made conditional on the observed data. Unlike the traditional approach, one need not be 
concerned with data sets other than the one that is observed. There is no need to discuss sampling 
distributions using the Bayesian approach. Also, (2) from a Bayesian viewpoint, it is legitimate to talk 
about the probability that the proportion falls in a specific interval, say (0.2, 0.6), or the probability 
that a hypothesis is true. Too often, traditional inferential conclusions are misstated; for example, if 
a confidence interval computed from a sample for a parameter is (0.2, 0.6), it is common for the stu- 
dent to incorrectly state that the population parameter falls in the interval (0.2, 0.6) with probability 
at least 0.90. The Bayesian viewpoint provides a convenient model for implementing the scientific 
method. The prior probability distribution can be used to state initial beliefs about the population 
of interest, relevant sample data are collected, and the posterior probability distribution reflects one’s 
new updated beliefs about the population parameter in light of the new data that were collected. All 
inferences about the parameter are made by computing appropriate summaries of the posterior prob- 
ability distribution. Because of formidable theoretical and computational challenges, the Bayesian 
approach has found relatively limited use. Recent advances in Bayesian analysis combined with the 
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growing power of computers are making Bayesian methods practical and increasingly popular. The 
Markov chain Monte Carlo (MCMC) method described in Section 13.5 is one of the computationally 
intensive methods that is often useful in Bayesian estimation. 


11.2 BAYESIAN POINT ESTIMATION 


The cornerstone of Bayesian methodology is the Bayes theorem. It helps us to update our beliefs in the 
form of probability statements about the parameters after the sample has been taken. The conditional 
distribution of the parameters after observing the data is called the posterior distribution that integrates 
the prior and the sample information. Suppose we have two discrete random variables, X and Y. 
Then the joint probability function (pmf) can be written as p(x, y) = p(x |y) py(y), and the marginal 
probability density function of X is px(x) = )0, p(x, y) = Ly p(« ly) py(y). Then Bayes’ rule for the 
conditional p(y |x) is 


p(x, y) = P(x\|y) py &) = p(x \|y) py () 
Px (x) Px (x) ply) py &) 
y 


P(y|x) = 


The denominator in this expression is a fixed normalizing factor that ensures that the )°,, p(y |x) = 1. 
If Y is continuous, the Bayes theorem can be stated as 


P(x\|y) py (Y) 


PONS, caiypy G) ay" 


where the integral is over the range of values of y. These two equations are the Bayes formulas for 
random variables. 


In Bayesian terminology, py(y) represents the probability statement of our prior belief, p(x|y) is 
the probability of the data x given our prior beliefs, which is called the likelihood, and the updated 
probability p(y|x) is the posterior. Because px (x) (which is the likelihood accumulated over all possible 
prior values) is independent of y, we can express the posterior distribution as proportional («) to 
[(likelihood) x (prior distribution)], that is, 


P(x) « ply) PQ). 


We use the notation f(x|9) to represent a probability distribution whose population parameter is 
considered to be a random variable. Now one of the problems is of finding a point estimate of 
the parameter 6 (possibly a vector) for the population with distribution f(x|6), given 6. Assume 
that 7(6) is the prior distribution of 0, which reflect the experimenter’s prior belief about 0. We will 
not distinguish between the scalars and vectors, which will be clear based on the specific situation. 
Suppose that we have a random sample X = (Xj,..., X;,) of size n from f(x|6). Then the posterior 
distribution can be written as 
FO, X1,.-.,Xn) — L(X1,..., Xnl)2() 


lb Cone: a = 
ae Pf lliniea a) FX Xn) 
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where L(X1,..., Xn|9) is the likelihood function. Letting C represent all terms that do not involve 6 
(in this case, C = 1/f(X1,..., Xn)), we have 


f(O|X1,..., Xn) = CL(X,..., Xn |8)2(8), 


For specific sample values X, =x1, X2=X2,..., Xn =Xn, the foregoing equation can be written in a 
compact form as 


f(O|x) « f(x |@)m(6), where x= (x1,%x2,...,Xn). 
This can be expressed as 
(posterior distribution) « (prior distribution) x (likelihood). 
The full result including the normalization can be written as 


(posterior distribution) = [(prior distribution) x (likelihood)] / [>-e@rior x likelihood) 
where the denominator is a fixed normalizing factor obtained by the likelihood accumulated over all 
possible prior values. We can now give a formal definition. 


Definition 11.2.1 The distribution of 0, given data x1, x2,...,Xn, is called the posterior distribution, 
which is given by 


_ fOr l6)7O) 
m(O |x) = eT (11.1) 


where g (x) is the marginal distribution of X. The Bayes estimate of the parameter 0 is the posterior mean. 


The marginal distribution g(x) can be calculated using the formula 


 f@l|O)x(6), in discrete case 
6 
B(x) = 4 
J F(x|@2(@)dO, in continuous case 
—Co 


where (8) is the prior distribution of 6. Here, the marginal distribution g(x) is also called the 
predictive distribution of X, because it represents our current predictions of the values of X taking 
into account both the uncertainty about the value of @ and the residual uncertainty about the random 
variable X when @ is known. 


In a Bayesian setting, all the information about 6 from the observed data and from the prior knowl- 
edge is contained in the posterior distribution, 2(6|x). In almost all practical cases, because we are 
combining our prior information with the information contained in the data, the posterior distribu- 
tion provides a more refined estimation of @ than the prior. All inferences from Bayesian methods are 
based on the posterior probability distribution of the parameter @. Using the explanation given later, 
we will take the Bayes estimate of a parameter as the posterior mean. 
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Furthermore, consider a Bayesian statistical inference problem where the parameter is a population 
proportion. In the Bernoulli trials, the population contains two types called “successes” and “failures.” 
The proportion of successes in the population is denoted by 0. We take a random sample of size n 
from the population and observe s successes and f failures. The goal is to learn about the unknown 
proportion 6 on the basis of these data. 


In this situation, a model is represented by the population proportion 6. We do not know its 
value. In Chapter 5, we have seen that we could use the maximum likelihood estimator (MLE) 
for estimating 6, which did not use any prior knowledge we may have about 6. Note that the 
maximum likelihood estimate is broadly equivalent to finding the mode of the likelihood. In 
a Bayesian setting, we represent our beliefs about location of 6 in terms of a prior probabil- 
ity distribution. We introduce proportion inference by using a discrete prior distribution for 0. 
We can construct a prior by specifying a list of possible values for the proportion 6, and then 
assigning probabilities to these values that reflect our knowledge about 6. Then the posterior 
probabilities can be computed using the Bayes theorem. The following example illustrates this 
concept. 


—_—e—e—e—enenrnrerererereereeeeeeeee nn n— aaa 
Example 11.2.1 
It is believed that cross-fertilized plants produce taller offspring than the self-fertilized plants. In order to 
obtain an estimate on the proportion 6 of cross-fertilized plants that are taller, an experimenter observes a 
random sample of 15 pairs of plants that are exactly the same age. Each pair is grown in the same conditions 
with some cross-fertilized and the others self-fertilized. Based on previous experience, the experimenter 
believes that the following are possible values of @ and that the prior probability for each value of 6 (prior 
weight) is 7:(6). 


6: 080 082 0.84 0.86 0.88 0.90 
m(@): 0.13 0.15 0.22 0.25 0.15 0.10 


From the experiment, it is observed that in 13 of 15 pairs, cross-fertilized is taller. Create a table with columns 
of the prior 7:(6), likelihood of L(X1, X2,..., Xn|@) for different values of 6 and for the given sample, prior 
times likelihood, and posterior probability of 6. Based on the posterior probabilities, what value of 6 has the 
highest support? Also, find E(@) based on the posterior probabilities. 


Solution 
The likelihood of obtaining 13 of 15 taller plants to the different prior values of x are given using the binomial 


1 
pdf @ 613(1 — 6)?. For example, if the prior value of 6 is 0.80, then the likelihood of 6 given the 


sample is 


f(x|0) = e (0.8)!3(0.2)? = 0.2309. 
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Table 11.1 
Prior values Prior Likelihood of 6 Prior times Posterior 
of 0 (9) given sample _ likelihood probability 
of 0 
0.80 0.13 0.2309 3.0017 1077 0.11029 
0.82 0.15 0.2578 0.03867 0.14208 
0.84 0.22 0.2787 6.1314x 1077 0.22528 
0.86 0.25 0.2897 7.2425 x 107? 0.2661 
0.88 0.15 0.2870 0.4305 0.15817 
0.90 0.10 0.2669 0.02669 0.098064 
Total 0.27217 0.9998 ~ 1.0 


From Table 11.1 we obtain )\ (prior x likelihood) = 0.27217. Hence, the normalized value corresponding to 
8 =0.80 is the posterior probability f(@|x), which is equal to (0.030017/0.27217) = 0.11029. Now, we 
can obtain the table of posterior distribution of a proportion x using the discrete prior given in Table 117.1. 


1 
When we substitute in Bayes’ rule, the factor (‘;) would be canceled. Hence, in the calculation of the 


1 
likelihood function, we could have just used 6!3(1 — 6)2 instead of the full expression (‘;) 6131 — 2. 


Thus, the Bayesian estimate of 6 is 
E(@) = (0.8)(0.11029) + (0.82)(0.14028) + (0.84)(0.22528) 
+ (0.86)(0.2661) + (0.88)(0.15817) + (0.9)(0.098065) 
= 0.84879 © 0.85. 


It may be noted that the MLE of 6 is 13/15 = 0.867. 
a 


In Example 11.2.1, the priors are called informative priors, because it favored certain values of 6; for 
example for the value 6 = 0.86, the prior value of z (@) is 0.25, which is higher than all the rest of the 
values. If there was no information or no strong prior opinions, then we could select a noninformative 
prior, which would have assigned equal prior probability of 1/6 to each of the possible values of 6. 
A noninformative prior (also called a flat or uniform prior) provides little or no information. Based 
on the situation, noninformative priors may be quite disperse, may avoid only impossible values of 
the parameter, and oftentimes give results similar to those obtained by classical frequentist methods. 
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Example 11.2.2 
Repeat the Example 11.2.1 using a noninformative prior, (9) = 1/6, for each given value of 6. 


Solution 
Here 2(0) = é for each value of 0. See Table 11.2. 


Table 11.2 
Prior Prior Likelihood of Priortimes Posterior 

values of (6) 0 given likelihood probability 
0 sample of 0 
0.80 1/6 0.2309 3.8483 x 1077 0.14333 
0.82 1/6 0.2578 4.2967 x 10-7 0.16003 
0.84 1/6 0.2787 0.04645 0.173 
0.86 1/6 0.2897 4.8283 x 102 0.17982 
0.88 1/6 0.2870 4.7833 x 10~7 0.17815 
0.90 1/6 0.2669 4.4483 x 10-2 0.16567 

Total 0.2685 1.0 


The Bayesian estimate for the noninformative prior is 
E(0) = (0.8)(0.14333) + (0.82)(0.16003) + (0.84)(0.173) 
+ (0.86)(0.17982) + (0.88)(0.17815) 


+ (0.9)(0.16567) = 0.85173. 
= 


It should be noted that because the choice of priors in Example 11.2.1 is only mildly informative, we 
do not see much difference in the values of Bayesian estimates. In general, it is difficult to construct 
an acceptable prior, because most often it has to be based on subjective experiences. Therefore, it is 
relatively easy to use a “noninformative” prior. For example, if we have no information on the values 
of proportion 0, then one type of standard “noninformative” prior is to take the proportion 0 as 
one of the equally spaced values 0, 0.1, 0.2,..., 0.9, 1. We can assign for each value of 6 the same 
probability, 7(@) = 1/11. This prior is convenient and may work reasonably well when we do not 
have many data. It is fairly easy to construct a prior when there exists considerable prior information 
about the proportion of interest. 


The posterior distribution gives us information regarding the likelihood of values of 6 given sample 
data. Then the question is how to use this information to estimate 0. Instead of having explicit 
probabilities, the prior may be given through an assumed probability distribution. We illustrate the 
calculations involved to find the posterior distribution in the following example. 
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Example 11.2.3 
Let X be a binomial random variable with parameters n and p. Assume that the prior distribution of p is 
uniform on [0,1]. Find the posterior distribution, f(p|x). 


Solution 
Because X is binomial, the likelihood function is given by 


fQlp) = 6 p(1— py". 


Because p is uniform on [0,1], m(p) = 1, O< p<1. 
Then the posterior distribution is given by 
n = 
f(plx) « fOlp)x(p) = @ p(1— py" *,x=0,1,...,” 


which is the same as the likelihood. 
|_| 


This example illustrates that if the prior is noninformative (uniform), then the posterior is essentially 
the likelihood function. In the case where the prior and posterior are of the same functional form, 
we call it a conjugate prior. Bayesian inference becomes simpler when the prior density has the same 
functional form as the likelihood (which is the case for the conjugate prior) or when data are an 
independent sample from an exponential family (such as normal, Poisson, or binomial). 


The following example demonstrates the method of finding posterior distribution for a continuous 
random variable. 


—X:0 :2:°:°.}>.>°.:.-?==___Cc_c“ 
Example 11.2.4 
Suppose that X is a normal random variable with mean yw and variance o2, where o2 is known and w is 
unknown. Suppose that jz behaves as a random variable whose probability distribution (prior) is z(j2) and 
is also normally distributed with mean jz p and variance a5) both assumed to be known or estimated. Find 
the posterior distribution f (|x). 


Solution 
Using the Bayes theorem, we have 

fOlm)a(u) 
SfOlWawdu 


2 2 
1 g-@—p)?/20? _1__ gp)" /205 
210 2T0p 


f(u|x) = 


(11.2) 


~ Jd 4-(—p)2/202 __1 —(U—-Lp)*/203, 
tome? IH5p © Pdus 
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2 
Consider the exponential term in (11.2), namely, wee + 


(u—Lp)? 
2 ——» 


2 
205 


202 203 2 o2 o2 


(x — p)? " (u—Mp)? 1 es x eo | 
P 


478 2 2 
1[ozt+e lL 

= =| 22 -2(8 425 )u+(5+2 
2 00% oF o2 o2 oF 


00%, x? IL, 
? o2 +02 \o2 + Be 
P Pp 


where 


2 442 2 2 
4 — E ( Fa Mpt sf *)| 
2 42 2: 2: 
f (ul x) = Ke oop opto Onto 


where K does not contain ju. 
This implies that the posterior density f(s |x) is the pdf of normal random variable with mean 


( : : 
Mpt+ x 

2 2 2 2 

Onto Opto 
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and variance 


2 2 
oon 


2 2° 
On +o 


If we let tp = = and t = aor then the posterior density can be rewritten as the pdf of normal random 
Pp 


variable with mean ae (Tpp + tx) and variance ae 
As an example, suppose that py = 100, op = 15, and o = 10, x= 115. Then f(y |x) is the pdf of a normal 


random variable with 


M au (100) + aa8 (115) = 110.4 
ean = = . 
100 + 225 100 + 225 
and 
100)(225 
Variance = mee) = 69.2. 
100 + 225 


11.2.1 Criteria for Finding the Bayesian Estimate 


In the Bayesian approach to parameter estimation, we use both the prior and observations. This leads 
to an estimation strategy based on the posterior distribution. How do we know that the estimate 
thus obtained is “good”? To assess the quality of likely estimators, we use a loss function L (0, a) that 
measures the loss incurred by using a as an estimate of 0. Here @ is the parameter being estimated (in 
real-world problems it is not known), and a is the estimate of 0. Then the “optimal” or “best” estimate 
a = @ is chosen so as to minimize the expected loss E[L(6, 6)], where the expectation is taken over 6 
with respect to the posterior distribution / (0 |x). Here we mention two types of commonly used loss 
functions: quadratic and absolute error loss functions and the resulting estimates. 


(1) A quadratic (or squared error) loss function is of the form L(6, a) = (a — 6)?. In this case, 
E[L(@,a)| = / L(O, a) f (O|x1,...,%n)dO 
= Jo — 6)" fOlx1, ...,%n)d0. 
Differentiating with respect to a and equating to zero, we obtain 
2 [ (a9) FOlx1..--.44)d0=0 
This implies 
a= [or Ot. sxnydo, 
This is the posterior mean (expected value) of 6, E (6|x1,...,X,). Hence the quadratic loss function is 


minimized by taking the estimate of 6, that is, 6, to be the posterior mean. In previous examples in 
this section, we used this value as the estimate 9. Note that what the quadratic loss function displays 
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is that if the estimate 6 and the true parameter 6 are close to each other, the loss we expect is very 
small. Likewise, if the difference is larger, the expected loss in estimating 6 with 6 is going to be large. 


(2) An absolute error loss function is of the form L (6, a) = |a — |. In this case, 


E[L(@,a)| = [ve a) f (8 |x1,...,% )dO 


/ (a — 6) f(@|x1,...,xn)d0 


0=— 00 


CO 
+ [e-a Fete... 0 
6 


=a 


Differentiating with respect to a and equating to zero, we obtain 


a 00 
J f@te....20)a0- [r@te.....1n)da=0 
6=a 


0=—00 


The minimum loss is attained when the values of both integrals are equal to 7 This can be achieved 
by taking @ to be the posterior median. 


The following can be considered as a general Bayesian procedure for point parameter estimation. 


BAYESIAN PARAMETER ESTIMATION PROCEDURE 
1. Consider the unknown parameter 6 as a random variable. 
2. Use a probability distribution(prior) to describe the uncertainty about the unknown parameter. 
3. Update the parameter distribution using the Bayes theorem: 


P(@|Data) « P(@)P(Data|@), 
that is, 


(posterior of @) « (prior of @).(likelihood). 


4. The Bayes estimator of 6 is set to be the expected value of the posterior distribution P(6 | Data) 
under quadratic loss function. 
5. The Bayes estimator of 6 is set to be posterior median under absolute error loss function. 


From the procedure of Bayesian estimation, it is clear that a bad choice of prior may result in a 
bad estimate. Generally, if the priors are based on a previous and trustworthy sample, Bayesian 
estimation methods are desirable. A schematic figure of steps involved in the Bayesian estimate is 
given in Figure 11.1. 
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Prior info, 
P(8) 


Loss 
function 


Posterior 


P(6| Data) Medated 


Likelihood 
P(Datalé) 


W@ FIGURE 11.1 Bayesian estimation procedure. 


In this chapter, we use only the quadratic loss function unless it is explicitly stated otherwise. We 
also mention that this loss function is very popular because of its analytic tractability. We now derive 
Bayesian point estimates for some specific distributions. 


Whereas uniform priors are useful in the noninformative situations, the beta family of distributions 
is one of the commonly taken informative priors. Distributions in the beta family take values in the 
interval (0, 1). Recall that if X ~ beta(a, B), then the pdf of X is given by 


T@tB) a1 -1 
fa -lTere” O- xP-l, O<x<1 
0 


’ 


otherwise, a > 0, 6B > 0. 


The beta pdf can be written as 
f @&) = Cre! el _ x)P1y& yal a _ x)B-1 ; 


where C = eos . We also know that 


ap 
(a+ B)* (a+ B+ 1) 


a 
BS ep and Var(X)= 


i. ?°?°?°?°»0606OQ°&°&SSS 1.0... TvX___c_—[". 
Example 11.2.5 
Let X1,..., Xn be a sample from geometric distribution with parameter p,0 < p < 1. Assume that the 
prior distribution of p is beta with a = 4, and 6 = 4. 
(a) Find the posterior distribution of p. 
(b) Find the Bayes estimate under quadratic loss function. 
Solution 
(a) Because p is Beta(4, 4), the prior density is 


T'(8) 


a Soe: 7 3 LAS 
Tore? (1— p)? = 1140p" (1 — p)”. 
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(b) 
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Because the r.v.’s X's have geometric distribution with parameter p, the likelihood is given by 


n = yon 
L(X1,...,Xn 10) =] [ pd - py? = pa - pt 
i=1 


The product of the likelihood function and the prior is given by 


3 xj—n 3 xj—n+3 
p'(1— p)=! [1407 (1- p> | = 140p"*? (1 — p)i=1 


Because, (posterior of p) « (prior ofp). (likelihood), rewriting the normalizing constant in the 
denominator of Equation (11.1) as C, and letting C, = 140C, the posterior distribution (because 
n 
a—1=n+3,and B—-1=)_, xj —n +3) is Beta(n +4, » xj—n +4). 
i=1 
Recall that for a Beta(a, 8B) random variable, the mean is |a/(a + B)]. Because the Bayes estimate 


n 
is the posterior mean, the mean of Beran +4, Vixj-n+ 4) is 
i=1 


n+4 n+4 


n n 
Peano are 
i=1 i= 


Note that for large n, the Bayes estimate is approximately n/ )~"_, xj, which is the MLE of p. 
In general, for a Bernoulli random variable with unknown probability of success p in [0,1], the usual 
conjugate prior is the beta distribution, where the parameters of the beta distribution are chosen to 
reflect any prior information that we have. 
We will follow the idea of the previous example in a binomial experiment of tossing a coin. 

= 


—“—“_——_ 2... ee IQqyuVu———V7V—VOVOOOOO 
Example 11.2.6 
Suppose we are flipping a biased coin, where the probability of heads p could be any value between 0 and 
1. Given a sequence of toss samples x1, x2, ...,X%», We want to estimate P(H) = p. We may have two 
sources of information: our prior belief, which we will express as a beta distribution, and the data, which 
could come from counts of heads x inn = 20 independent flips of the coin, say x = 13. Suppose that in six 
prior tosses, we observed three heads and three tails, which lead us to believe that the value of p is near 
0.5. Obtain the posterior distribution of p. 


Solution 
Here our prior belief or assumption can be written in terms of beta distribution as 


T 
ee (a + B) 


= a—1 (y B-1 
rare? “ ? 


where a = 4 and B = 4. That is (noting T(n) = (n — 1)!) 


! 


(33) 


mp) = pe(l— p)>. 
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Hence, m(p) « pr(1 — p)?. Because the mean of a beta distribution is a/(a + B) and the variance is 
aB/((a + B)? (a+ B +1), for the prior, 


4 
M = — =0.5, 
ean(p) tac 


and 


(4)(4) 


Var) = Gy aat4a) 


0.028. 


Let X denote the number of heads in 20 flips of this coin. Then X has a binomial distribution, and the pmf 
is given by 


20 
f@lp) = ( )ra <p, 20, 11,+420, 


This we can write as 
FGlp) « p*(1 — p)??™. 


In the 20 flips we have observed 13 heads. Then fix x = 13, and we are interested in the likelihood, which is 
the relative value of the function at different values of p: 


f(13|p, 20) x pl — p)’. 
The posterior probability of p, given x = 13, is 
m(p|x = 13) x f(x|p)x(p) 
_ (Pa = no) pd — py? 
= p61 — py, 


Thus, the posterior is a beta distribution with a= 17 and B = 11. Consequently, we can now obtain the mean 
and variance of p as 


17 
M a 07 
ean(P) = T7y 


and 


_ (17)(11) 7 
Var(P) = oa an2a7+ itty OO 


Note that the prior was beta distribution with mean 0.5 and variance 0.028. Figure 11.2 gives the prior and 
posterior densities. 


Note that if we had ignored the prior and just took the point estimation, then the MLE of pis MLE(p) = p= 


xg = 0.65. Compare this with the Bayesian estimate of p = 0.607. Because Beta(1, 1) is the Uniform [0, 1], 
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W@ FIGURE 11.2 Prior and posterior distributions for the proportions. 


the method of the previous example can be used for noninformative priors. The method could also be used in 
many applications. For example, suppose p represents the proportion of infected individuals in a population, 
and x is the number of infected individuals in a sample of size n. Then with a noninformative prior, we can 
show that the posterior of p is Beta(x + 1, n —x + 1). This type of setting can be used for estimating the 
true proportion of infected individuals in the population. 

= 


a — 
Example 11.2.7 
Suppose for the past million days we have been predicting whether the sun will rise the next morning or 
not. Each evening we say that the sun will rise the next morning (R), and we were right (R) all these days. 
Suppose on the 10° evenings we predicted that the sun will rise on the next day. What is the probability 
that the sun will rise the next day? 


Solution 
The problem can be cast in the following table form. 


1/2 10° | 10° + 1 
R|R R R 
R|R R 2 


P(R|R) = 1 if we use the frequency method of estimation (for example the MLE). Let us now consider the 
Bayes method. Suppose the prior is uniform on [0,1]. That is, 


1, ifO<p<1 
T(p) = ; 
0, otherwise. 
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Suppose we predict n times and we succeed x times. Then 
fp) = @ pr. — py". 
The joint pdf is given by 
FX, p) = flp)a(p) 
= ( pe — py’*, x=0,1,...,2; O<pK<l. 


By the Bayes theorem, the posterior pdf x(p|x) is 


n(p\x) = = Sf Olp)a(p) 


JS f(lp)x(p)dp 
0 


= K(n,x)p*—p)"*, O<p<1, O<x<n, 


which is a beta probability distribution. Recall that the beta density is given by 


yt — yh} 


ff) = BaD 


and E(Y) = Poa Thus, 


x+1 x+1 


E[x(p|x)] = G@+tD+m—x+1. n+2’ 


In our example, x = 10°, n = 10°, which implies that the posterior mean is given by 


10°+1 
10° +2 


A 


PB = 


2: .-$_]$ A 


Example 11.2.8 
Let X1, X2,...,Xn be N (wu, 07) random variables with prior x (w) having N (0, 06) distribution with 
known o?. 

(a) Obtain the posterior distribution of jw. 

(b) Suppose it is known from past experience that the weight loss for a particular combination of 
diet and exercise program (if followed for a month) is normally distributed with mean 10 Ib and 
standard deviation of 2 Ib. A random sample of five persons who went through this program for a 
month produced the following weight loss in pounds: 


14 8 11 7 #11 


What is the point estimate of the mean, j4? Assume o27 =4., 
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Solution 
(a) Because m(u) ~ N (uo, 06), m(u) x exp[(u— Mo)? /o6| and we omit the terms that do not 
depend on pw. We have from the data x = (x1, ..., Xn), the likelihood function, 


= a: 
L (x1, --.5Xn i= Fu) [Tap] =e | 


plies 202 
= on : [i — w)? ral 
= 


where jt is determined by the posterior distribution. The product of the likelihood function and the 
prior gives the posterior, which is obtained (after some algebra) as follows: 


P(welx) 0 mw) fen) & exp [— (uw — 111)? /207| 


where 
n= 1 
7 a2 oe HO 
eal _ n 1 
o2 a 
and 
2 1 
= 
n 1 
go 
Oo 9% 


Thus, the posterior distribution of x is N (1,07): 


(b) Note that the sample mean ¥ = 10.2 lb, and sample standard deviation s = 2.77 |b. Now from 


— 


part (a), the posterior distribution of tz is normal with mean 


n= 1 
a2 + GzHO (10.2) + Fy (10) 


iva ——9__ = =< = 10.167 
ot t 32 got oe 
and variance 
- : 0.66667 
oj = =0. 
aa 20 L >, oF 1 
o % 2 2 


Thus, the point estimate of jz is the posterior mean, 10.167. Figure 11.3 represents the prior and 
posterior densities of ju. 


Sometimes, the inverse of variance in the normal distribution is called the precision of the normal 
distribution and denoted by t= 1/o7. Also note that in part (a) of the previous example, if the 
prior variance af — oo, then the prior flattens out, (ww) « c, a constant. This basically amounts to 
saying that prior information on ju decreases, that is, all jx are equally probable. This corresponds 


to a noninformative prior. Also, in this case as on > 0,07 > a and 4; — Xx. Hence, in the limit 
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W@ FIGURE 11.3 Prior and posterior densities of ju. 


(i.e., for noninformative priors), the posterior f (|x) will have an N(x, o?/n) distribution, which is 
exactly the same inference as in classical statistics. 


In Bayesian inference problems, one of the questions is, which will have relatively more influence, 
prior or likelihood? As we observe a large amount of data, it can be shown that the posterior 
distribution is almost exclusively determined by the data. That is, asymptotically, observed data will 
have a larger influence compared to the choice of prior, and thus the prior will be irrelevant. Hence, 
we can make the following general observations. If the prior is noninformative and we have a large 
data set, then we can expect that the likelihood will have greater influence. Whereas, if we have a 
small data set and an informative prior, then the prior will have a larger influence on the updated 
posterior distribution. Bayesian estimators are more complicated to compute than calculating the 
maximum likelihood estimates in simple cases. However, in complex settings Bayesian statistics are 
often relatively easier to compute. 


One of the problems in using Bayesian analysis is choosing an appropriate prior. There are no specific 
tules available for this purpose. For instance, the following priors are commonly used in the literature. 
If data are in [0,1], we could use uniform or beta distribution. If the data are in [0, 00), normal (with 
nonnegative and relatively large j.), gamma, or log-normal distributions are used. If the data are in 
(—oo, 00), normal or t-distributions are commonly used. 


EXERCISES 11.2 


11.2.1. Suppose in a casino, two kinds of dice are used, one kind of which 98% are fair, and 2% 
are loaded such that five comes up 60% of the time and the rest of the numbers are equally 
probable. We pick a die at random and roll it three times. We get three consecutive fives. 
What is the probability that the die is loaded? 
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11.2.2. It is believed that cross-fertilized plants produce taller offspring than self-fertilized plants. 
In order to obtain an estimate on the proportion @ of cross-fertilized plants that are 
taller, an experimenter observes a random sample of 15 pairs of plants exactly the same 
age, with each pair grown in the same conditions with one cross-fertilized and the 
other self-fertilized. Based on previous experience, the experimenter believes that the 
following are possible values of z and prior probabilities for each value (prior weight), (8): 


@: | 0.80 | 0.82 | 0.84 | 0.86 | 0.88 | 0.90 
(8): | 0.03 | 0.40 | 0.22 | 0.15 | 0.15 | 0.05 


From the experiment, it is observed that in 13 of 15 pairs, the cross-fertilized is taller. 

(a) Create a table with columns for prior, likelihood of 6 given sample, prior times likeli- 
hood, and posterior probability of 6. Based on the posterior probabilities, what value 
of 6 has the highest support? Also, find E(6) based on the posterior probabilities. 

(b) Redo part (a) with a completely noninformative prior, that is, take the prior for the 
proportion 6 as one of the equally spaced values 0, 0.1, 0.2, ..., 0.9, 1. Also assign for 
each value of 6 the same probability, 7(@) = 1/11. 

(c) Calculate the MLE of @ and compare it with the Bayesian estimate. 


11.2.3. Consider the problem of estimating p in a binomial distribution. Let X be number of 
successes in a sample of size n. 


(a) Let the prior distribution of p be given by Beta(3,1), that is 


3p?, O<p<l 
0, otherwise. 


“=| 


Find the posterior distribution of p. 


( pY(l— p)”-*, x=0,1,2,...,n 
x 


0, otherwise. 


Hint: f (x|p) = 


(b) Let the prior distribution of p be given by Beta(a,b) (that is, x (p) « p*! (1 — p)’-?. 
Find the posterior distribution of p. 


11.2.4. A biased coin is tossed n times. Let x; be 1 if the ith toss is heads and 0 if it is tails. Assume a 
noninformative prior, p (0) = 1, 0 < 6 < 1. Lett be the number of heads obtained. Show 
that the posterior distribution of 6 is Beta (t+1,n—t+ 1). 


11.2.5. Let X1, X2,..., X, be exponential random variables with parameter i. Let the prior z (A) 
be exponentially distributed with parameter jz, which is a fixed and known constant. 
(a) Show that the posterior distribution of A is Gamma (1 + )7y_) xi, n +1). 
(b) Obtain the Bayes estimate of i. 


11.2.6. Let X), X2,..., X, be Poisson random variables with parameter A. Assume that A has a 
Gamma (a, B) prior. 
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(a) Compute the posterior distribution of i. 

(b) Obtain the Bayes estimate of i. 

(c) Compare the MLE of 4 with the Bayes estimate of i. 
(d) Which of the two estimates is better? Why? 


11.2.7. Let X1, X2,..., X» be Poisson random variables with parameter 4. Assume that 4 has an 
exponential distribution with 6 = 1 prior. 
(a) Compute the posterior distribution of i. 
(b) Show that the Bayes estimate of 0 is Gamma (()0_, x1 + 1), (n+ 1). 


11.2.8. It is known that a certain disease has affected 10% of a population. In a random sample of 
50 patients typical of the disease group who are exposed to a new treatment, we observe 
that 12 patients were hospitalized in a year. Let yw be the rate of population that need 
hospitalization. Assume that 


fe ~ Gamma (0.1,2) and = f(x|u) ~ Poi (50p). 
Given that 0.24 is an observation from f(x|), find the Bayesian estimator of jx (that is, 


obtain E(j2|x)). 


11.2.9. Let X1,...,X, bean N(y, 2) random sample with prior m() having N(0, 07) distribution 
with known o7. Obtain the posterior distribution of ju. 


11.2.10. Let X1,..., X, bean N(u, 1) random sample with prior z() having the pdf [1/z (1 + )]. 
Show that the posterior 


ve 1 
x 
2 1 


risls) <a | 


11.3 BAYESIAN CONFIDENCE INTERVAL OR CREDIBLE INTERVALS 


In this section, we want to study the question, “Can we construct an interval where we are confident 
that the interval contains the unknown true value of 6?” We have seen how in many situations it 
may be preferable to use an interval estimate instead of a point estimate for a population parameter 
@. Such intervals in classical statistics were called confidence intervals. We can extend the concept 
of interval estimation to a Bayesian setting. The Bayesian analog of a confidence interval is called a 
credible interval and is defined as follows. 


Definition 11.3.1 A 100(1 — a)% credible interval for @ is an interval (a, b) such that 
Pia<60<b|x1,...,%) => (—a) 100% 
Here a is given small positive number between 0 and 1, and x1,..., Xn are the sample values. 


Note that we read this definition backwards, that is, we are at least (1 — w) 100% confident that the 
true value of 6 is between a and b, given the sampled information. 
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0 a—» b 


Wl FIGURE 11.4 Credible interval for 0. 


Because the conditional distribution of 6 given X1,..., X, is actually a probability distribution, it 
makes sense to talk about the probability that @ is in the interval (a, b). Once we have observed 
data, the credible interval is fixed while 6 is random. This is in contrast to the classical confidence 
interval where the interval is random but @ is a fixed parameter. In the classical case, we would say, 
“In the long run, 100(1 — @) % of all such intervals will contain the true parameter 0.” In the Bayesian 
approach, we would say, “The probability is at least 100(1 — w) % that 6 lies within the specified 
interval (a, b).” 


As in the classical case, it would be desirable to minimize the length of the credible interval. This 
entails choosing only those points with highest values in the density of f(6|x1,..., x, ), as shown in 
Figure 11.4. 


Definition 11.3.1 can be rephrased as follows using the posterior distribution of 6. 


Definition 11.3.2 A 100(1 — w)% credible interval for @ is an interval (a, b) such that 
b 
1. f f @|x1,.--,Xn)d0 > 1 -— a, if 0 is continuous, and the posterior pdf of @ is f (0 |x1,..., Xn). 
a 


b 
2. Of (Olx1,...,%) = 1— a, if 6 is discrete. 
We will now give some examples for computing credible intervals. 


Sr, 2878200. eee ees 
Example 11.3.1 
Suppose X1,..., X, isarandom sample from N(u, a”) with o? = 4. Suppose the prior pdf of jz is N(O, 1), 
that is, 7 (ww) ~ N (0, 1). Find a 95% credible interval for ju. 


Solution 
We have seen from Example 11.2.8 that the posterior distribution of wu given x1, ..., Xn is normally distributed 
with 


ba 


Mean = 


1+ 


S15 
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(1 +4)"x 


W@ FIGURE 11.5 Posterior distribution of ju. 


and 
Variance = mae 
144 
[ic2 


Figure 11.3 represents the posterior distribution of L. 
To find the 95% credible interval for jz, we have to find two numbers a and b such that 
P(a<X <b)=0.95 


1 
= n . 
1+4 


where 
x 
X~N[ w= q? 
1+4 


We choose a to be —b (b is positive). Using z-scores, we get (X is continuous), 


bh ax 
P Za/2 < “— <Zq/2 | =l-a@ 
1 
which can be rearranged as 
1 _ 1 jr + 1 1 
oad Za/2<M< x Za/2} = 1a. 
4 4 
ee 1+4 +r 1+ 
Thus, a 95% credible interval for ju is 
1 je rf 1 
x Za/2> x Za/2 

4 4 
la? I+q f/14+4 


For convenience, we summarize this procedure in the following steps. 
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BAYESIAN CREDIBLE INTERVAL PROCEDURE 
1. Consider 6 as arandom variable with prior pdf (or pmf) 7(6). 
2. Update the prior distribution (6) using the Bayes theorem. That is find the posterior distribution of 
6 by the formula 


f (data\@ )x(0) 


T Fldatal@)x(0)d0" if continuous 


(0 |data ) = 
f (data|@ )x(0) 


¥ Fldatala )x(0)" if discrete. 


3. Find two numbers a and b such that 


b 
[ 0 \datayao >1-a,_ if continuous 
a 


b 
ae |data)>1-—a, _ if discrete. 
d=a 


Note: The numbers a and b are found such that 


a 
il m(0 |data)\d@ =a/2, if continuous 


—co 


De m(6|data) =a/2, if discrete 


6<a 


and 
CO 
[ #0 \darayae =a/2, if continuous 
b 


>= 2(6 |data) = «/2, if discrete. 
0>b 


4. The (1 — a)100% credible interval for @ is the interval (a, b). 


In the discrete case, an easy way of finding a credible interval of smallest length is to arrange the 
values of @ from most likely to least likely (that is, in the order of the magnitude of the posterior 
probabilities), and then put values of @ into the interval until the cumulative posterior probabil- 
ity of the set exceeds (1 — a) 100%. Such an interval is called a highest posterior density (HPD) 
interval. It can be shown that the HPD interval always exists, and it is unique, so long as for all 
intervals of probability (1 — q@), the posterior density is never uniform in any interval of values of 6. 
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nn oe 


Example 11.3.2 
For the data of Example 11.2.1, find a 90% credible interval for 6. 


Solution 

Arranging the values of 6 from most likely to least likely, we have Table 11.3. Looking at the “cumulative 
probability” column, we see that the probability that 0 is in the set {0.86, 0.84, 0.88, 0.82, 0.80} is 0.90192. 
So this set is a 90% probability (or credible) interval for 0. 


EXERCISES 11.3 


Table 11.3 

Prior values Posterior probability Cumulative 
of 0 of 0 probability 
0.86 0.2661 0.2661 
0.84 0.22528 0.49138 
0.88 0.15817 0.64955 
0.82 0.14208 0.79163 
0.80 0.11029 0.90192 
0.90 9.8064 x 10-2 0.99984 


., X, isarandom sample from N (1, 07) with o* = 9. Suppose the prior 


pdf of yz is N (0, 1); that is 2 (4) ~ N (0, 1). Find a 95% credible interval for pw. 
(b) The following is a set of random data from a normal distribution with variance 9. 


—2.60 0.71 —3.66 1.38 3.87 
7.42 1.76 0.01 2.69 1.54 3.97 1.34 1.63 1.24 4.78 


Using the results of part (a), compute a 95% credible interval for jz, interpret its meaning, 


11.3.1. (a) Suppose Xj,.. 
0.92 1.05 5.53 3.64 —4.47 
and state any assumptions you have made. 
11.3.2. 


Suppose that a person believes that his last year’s weight was normally distributed with 
mean of 165 lb and standard deviation of 5 lb. That is, the prior pdf of 4 is N(165, 25), or 
w(t) ~ N(165, 25). He expects his current weight X is normally distributed with mean p 
and standard deviation 7 lb. Following are 10 random measurements (in pounds) from this 
year. 


176 165 180 172 175 
179 166 177 184 183 


Find a 95% credible interval for jp. 
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11.3.3. It is known that a certain disease affects 10% of a population. In a random sample of 50 
patients in the disease group who are exposed to a new treatment, we observe that 12 patients 
were hospitalized in a year. Let x. be the population rate that needs hospitalization in a year. 
Assume yz has a Gamma (0.1, 2) prior. Let ~~ Gamma (0.1, 2) and f (x|~) ~ Poi (50). 
Given that x = 0.24 is an observation of X, find 95% credible internal for ~. Obtain a 
Bayesian credible interval for jw. (If X is the number of patients admitted in a year, assume 
X ~ Poi (5042), the Poisson approximation of the binomial.) How can we improve on this 
estimate? 


11.3.4. Foran upcoming congressional election, suppose we want to estimate the amount of support 
for a particular candidate in a district. By previous experience and voter registration data, we 
can assume that the prior distribution of the proportion of support, p, is a beta distribution 
with a= 10, and B= 8 (ie., 7 (p) ~ Beta (10, 8)). We conducted a survey of 1000 randomly 
selected voters, of whom 600 support the candidate. Obtain a 95% credible interval for p. 
What will happen to the credible interval if we reduce the confidence interval? What will 
happen to the 95% credible interval if we increase the sample size? 


11.3.5. It is recommended that the daily intake of sodium be 2400 mg per day. From a previous 
study on a particular ethnic group, the prior distribution of sodium intake is believed to be 
normal with mean 2700 mg and standard deviation 250 mg. Ifa recent survey for this group 
resulted in a mean of 3000 mg and standard deviation of 300 mg, obtain a 95% credible 
interval for the mean intake of sodium for this ethnic group. 


11.3.6. Suppose we have a coin (not necessarily balanced) with p being the probability of heads. 
Assume a uniform prior for p. Suppose in 20 tosses of this coin, we obtained 12 heads. 
Obtain a 90% credible interval for p. 


11.3.7. Suppose that in a particular telephone exchange, the number of calls received per minute has 
a Poisson distribution with parameter 1. Assume an exponential prior for A with parameter 2. 
Suppose this exchange had received 25 calls in five minutes. Obtain a 95% credible interval 
for X. 


11.4 BAYESIAN HYPOTHESIS TESTING 


The Bayesian approach to hypothesis testing for simple hypotheses is pretty straightforward. Deciding 
between two hypotheses for a given set of data x reduces to computing their posterior probabilities. 
If an explicit loss function is available, the Bayes rule is chosen to minimize the expected value of 
the loss function with respect to the posterior distribution. In the absence of a loss function, the 
probabilities of type I and type II errors are of little interest to the Bayesian. 


In the classical hypothesis testing, we test a null hypothesis (denoted by Ho) against an alternative 
hypothesis (denoted by H; or H,). The test procedure is based on controlling the two types of errors— 
type I and type II. The classical test procedures limit the type I error to a and minimize the type II 
error. If the type II error is unacceptably high, it is reduced by increasing the sample size. 
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In the Bayesian approach, the problem of deciding between the null and alternative is rather 
straightforward. Consider the problem of hypothesis testing with 


Ho :0€ Oo vs. 4, :0€ O1 (11.3) 


where ©o, ©} are subsets of the real line. Let X1,..., X, be the sample from a population with pdf 
fo(x). 


In the Bayesian hypothesis testing approach we compute the following posterior probabilities: 
a9 = P(OE Oo|x1,..-, Xn) (11.4) 


and 
ay = P(E Oy |x1,...,Xn). (11.5) 


If a > a1, we accept the null hypothesis, and if w < a1, we reject the null hypothesis. We now 
outline the Bayes hypothesis testing procedure for testing hypothesis (11.3). 


Let z (6) be the prior. Also, 
mo = P(@E Oo) 


and 
m1 = P(@€EO}) 


Definition 11.4.1 The ratio 19/1 is called the prior odds ratio. The ratio a9/a (see Equations (11.4) 
and (11.5)) is called the posterior odds ratio. 


The posterior odds ratio is the ratio of the posterior probabilities, given the data, of the null and alter- 
nate hypotheses. The posterior odds ratio will be used in decision making for testing the hypotheses. 
We now compute @ and q@ using the Bayes theorem. That is, 


ag = P(@E€ Og|x1,.-.-,Xn) 


f f@lx1,...,x,)d6, if continuous 
©o 


> f(@|x1,...,%), if discrete. 


Similarly, 


ay P(OEOq |x1,...,4n) 


f f @lx1,...,x,)d6, if continuous 
O1 


> f(@lx1,...,%n), if discrete. 
dEO, 


We reject Ho if the odds ratio (w/a) < 1 and accept Ho if (a9/a1) > 1. 


586 CHAPTER 11 Bayesian Estimation and Inference 


This method of hypothesis testing is called Jeffreys’ hypothesis testing criterion. It basically says that 
if the posterior odds ratio is greater than 1, we accept the null hypothesis; otherwise, we reject the 
null in favor of the alternative hypothesis. 


Because we cannot determine the probability of a single value in the continuous variable case, it 
should be noted that for a simple null hypothesis of the form 6 equals some specified value cannot 
be dealt with easily in the Bayesian framework. Hence, unlike the classical framework, here we mostly 
deal with the composite hypotheses for both null and alternative. 


ee 


Example 11.4.1 

A student taking a standardized test is classified as gifted if he or she scores at least 100 out of a possible 
score of 150. Otherwise the student is classified as not gifted. Suppose the prior distribution of the scores 
of all students is a normal with mean 100 and standard deviation 15. It is believed that scores will vary each 
time the student takes the test and that these scores can be modeled as a normal distribution with mean jz 
and variance 100. Suppose the student takes the test and scores 115. Test the hypothesis that the student 
can be classified as a gifted student. 


Solution 
The hypothesis testing problem can be phrased as 


Ho : 6 < 100 vs. Ha: 0 => 100. 


Referring to the Example 11.2.8, we know that the posterior distribution f(@|x) is a normal with mean 
110.4 and variance 69.2. Because the prior is an N(100, 225), we have mg = P(@ < 100) = 1/2 and 
a, = P(@=> 100) = 1/2. 

We can now compute 


ag = P(O < 100|x = 115) 


P (¢ — 110.4 100 — ~~) 
= < 
V69.2 V69.2 


P( < a“ ) 0.106 
= LS = =. 
V69.2 


and 


a, = P(@> 100|x = 115) 


1— P(6 < 100|x = 115) 


1 — 0.106 = 0.894. 


Thus, ag/a1 = (0.106/0.894) = 0.119 < 1, and we reject Ho. 
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BAYESIAN HYPOTHESIS TESTING PROCEDURE 

To test Hp : 6 € Oo vs. Hi : 6 € ©), where Oo and ©; are given sets: 
1. Consider 6 as a random variable with prior distribution (6). 
2. Compute the posterior distribution f (6 |x1, ...,Xn ) of 8 given x1, ...,Xn, using Bayes’ theorem. 
3. Compute ag and @ using the following formulas: 


ag =P(O€ Oog|x1,..-,Xn) 


Jf (@\x1, ...,Xn)d6, _ if continuous 
©o 


> f (@|x1,..-,Xn), if discrete 
6EOo 


and 
a, =P(@E O04 |x1,...,Xn) 


ff @\x1,....%n)d6, — if continuous 
O71 


> f (@|x1,..-,Xn), — if discrete. 
6EOQ1 


4. Reject Ho if the posterior odds ratio, 0 — 1, Otherwise accept. 
ay 
In the foregoing procedure, we assume that P (9 € 0) and P (6 € ©}) are both greater than zero. 


EXERCISES 11.4 


11.4.1. The following is random data from a normal distribution with variance 9. 


0.92 1.05 5.53 3.64 —4.47 —2.60 0.71 —3.66 1.38 3.87 
7.42 1.76 0.01 2.69 1.54 3.97 1.34 1.63 1.24 4.78 


(a) Test the hypothesis, Hp : uw < 0 vs. Hy : uw > 0. Assume that the prior is N(0, 4), so 
that w < 0 and yu > O are equally probable. 
(b) Compare your decision with classical hypothesis testing, with a = 0.05. 


11.4.2. (a) For the data of Exercise 11.3.2, using the Bayesian method, test the hypothesis 
Ho: uw < 170 vs. Hg: wu > 170. 
(b) Compare your decision with classical hypothesis testing, with a = 0.05. 


11.4.3. It is known that a certain disease affects 10% of a population. Of a random sample of 
50 patients in the disease group who are exposed to a new treatment, we observe that 12 
patients were hospitalized in a year. Let be the population rate that needs hospitalization 
in a year. Assume yp has a Gamma(0.1, 2) prior. Let ~~ Gamma(0.1, 2) and f (x|m) ~ 
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Poi(50y). Given that x = 0.24 is an observation of X, test the hypothesis Hp : p < 
0.10 vs. Ha : p > 0.10. (If X is the number of patients admitted in a year, assume X ~ 
Poi (50), the Poisson approximation of the binomial.) 


11.4.4. For an upcoming congressional election, suppose we want to estimate the amount of 
support for a particular candidate in a district. By previous experience and voter registration 
data, we can assume that the prior distribution, the proportion of support, p, is a beta 
distribution with w = 10, and 6 = 8 (i.e., z (p) ~ Beta (10, 8)). We conducted a survey of 
1000 randomly selected voters, of whom 600 support the candidate. Test the hypothesis 
Ho: p = 0.60 vs. Ha : p < 0.60. 


11.4.5. For the data of Exercise 11.3.5, test the hypothesis Hp : ~ < 2400 mg vs. H, : uw > 2400 mg 
for this ethnic group. 


11.4.6. Suppose we have a coin (not necessarily balanced) with p being the probability of heads. 
Assume a uniform prior for p. Suppose in 20 tosses of this coin, we obtained 12 heads. 
Test the hypothesis Ho : p > 0.50 vs. Hy : p > 0.50. 


11.5 BAYESIAN DECISION THEORY 


Bayesian methods in general are more concerned with problems of decision making than with prob- 
lems of inference. Decision theory, as the name implies, is concerned with the problem of making 
decisions. Statistical decision theory is concerned with optimal decision making under uncertainty 
or when statistical knowledge is available only on some of the uncertainties involved in the deci- 
sion problem. Uncertainty could be about the true value related to the decision, or, uncertainty 
could be about the actual state of the nature. Abraham Wald (1902-1950) laid the foundation for 
statistical decision theory. Original works on the decision theory emerged out of game theory con- 
siderations. Many books and articles have been written on the various aspects of decision theory. The 
Bayesian approach to the decision theory was introduced by Leonard Jimmie Savage in 1954. In this 
section, we introduce the general idea of decision theory. We basically deal with analytical procedures 
for the decision-making process. This will involve selection of an optimum decision from a choice 
of courses of action among two or more alternatives. The Bayesian decision theory quantifies the 
trade-offs between different decisions using costs and probabilities that accompany such decisions. 


Consider, as an example, a company deciding whether or not to market a new brand of toothpaste 
with a whitening agent. Clearly many factors will affect the decision (for example, the proportion of 
people who are likely to switch to the new brand, and the likelihood of other competing companies 
introducing similar toothpastes). These factors are generally unknown, but estimates can be obtained 
from statistical investigations. 


The classical statistical approach relies exclusively on the data obtained from these statistical inves- 
tigations, ignoring other relevant information such as the company’s past experiences in marketing 
similar products. Statistical decision theory tries to combine other relevant information with the 
sample information to arrive at the optimal decision. Therefore, a Bayesian setting seems to be more 
appropriate for decision theory. 
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One piece of relevant information that decision theory considers is the possible consequences of the 
decisions. Often these consequences can be quantified. That is, the loss or gain of each decision can 
be expressed as a number (called the loss or utility). A loss or utility to a decision maker is the effect 
of the interaction of two factors: (1) the decision or action selected by the decision maker; and (2) 
the event or state of the world that actually occurs. Classical statistics does not explicitly use a loss 
function or a utility (payoff) function. 


A second source of information that decision theory utilizes is the prior information. Prior informa- 
tion could be based on past experiences of similar situations or on expert opinion. We can follow the 
procedure explained next as a guideline for decision making. 


GENERAL DECISION THEORY PROCEDURE 

1. Identify the objectives of the decision-making process. 

2. Identify the set of actions and set of possible events (states of nature). 

3. Assign probabilities to the occurrence of each possible state of nature (prior). If more observations 
are available, calculate the posterior probabilities to the occurrence of each possible state of 
nature. 

For each possible event, assign a numerical value to the anticipated payoff (or loss) of each course 
of action. 

Compute the expected value of the payoffs (utility or loss function). This could be done by either 
using the prior probabilities if there are no observations, or using the posterior probabilities. 
Select the optimum decision among the available alternative courses of action that maximizes the 
expected value of the payoffs. 


- 


oH 


a 


We now consider an example to illustrate the idea of statistical decision making. 


—e_—X—_—<—— aos 
Example 11.5.1 
Suppose you own a small stall at a flea market that is open only on weekends. If the weather is good, you 
make a profit of $200, and if it is bad, you close your stall and you make no (zero) profit. However, you have 
the option of buying, from an insurance company, weather insurance that costs $75. The company pays 
you $210 if the weather is bad. Suppose you believe that the probability of good weather on a particular 
weekend is p. Compute the expected gain if you insure and if you do not. What is the best course of action? 
Arrive at a decision. 


Solution 

From the information in the problem, we can obtain the utility gain or profit table shown in Table 11.4, 
based on our decision to insure or not insure. Suppose that we model the state of weather as good or bad 
by means of a random variable defined as follows. 


1, if the weather is good 
0, if the weather is bad. 
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Table 11.4 

Weather 
Parameter Space — Good Bad 
Decision Space |D (01) (62) 
Insurance (I)(d1) $125 (200-75) $135 (210-75) 
No Insurance (NI)(d2) $200 $0 


Suppose for our example we believe that during a particular weekend P(@ = 1) = p,and P(@=0)=1-p. 
This can be considered as prior information. The different values of 6 are called states of nature. We assign 
(perhaps subjectively) a probability structure for the states of nature defined by a prior distribution m(6). Now 
we can compute the expected gain when we insure and when we do not. 

Using the values in the table, 


Expected gain given we insure = (125) p + (135) (1 — p) 
= 135-—10p 


Expected gain when do not insure = (200) p + (0) (1 — p) 
= 200p 


Hence, insurance is preferable if 
135 — 10p > 200p 


or 


135 
p< — = 0.643. 
210 
That is, we should take the insurance if we believe the probability of good weather is less than 0.643. 


In general the states of the nature are represented by 6, ..., 6, and the possible decisions (actions) 
are represented by d1,..., dm. Let U (dj, 6;) represent the net gain when the true states of nature is 6; 
and the decision d; is made. Then we can construct the general utility table shown in Table 11.5. 


In Bayesian decision theory, we assume a probability distribution on the states of nature called the 
prior distribution. Using this probability distribution, we can find the decision that maximizes the 
expected utility. That is, let the states of nature be initially modeled by a random variable @ with 
probability function 2(@) such that P(@ = 6;) = 7(6;), i=1,...,n. Let U denote the utility. Then the 
expected utility for decision d; is given by 


n 


E(U|d;) = YoU (dj. %) x (Gj). 


i=1 
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Table 11.5 
States of nature 
0; ay ee 6; Ao 6, 
d, U(d,0:) U (d), 62) U (d1, 6) U (dy, On) 
dy 
Decision 
States 
dj u (4),4) 
dn U (d1, 01) U (dn, On) 


The optimal decision, called the Bayes decision, denoted by d*, is that which maximizes the expected 
utility. That is, d* satisfies the following equation: 


n n 


max ) 1 U (dj, 61) x (6i) = ) | U (d*, 61) (6%). 


J j=1 i=1 


This procedure is called the Bayes decision procedure with respect to the assumed or given prior 
m(6;), i= 1,2,...,n. 


PROCEDURE TO FIND OPTIMAL DECISION 
1. For each decision d;, compute 7, U (dj,6;) = (6;) 
2. Find a decision d* from the decision space that maximizes the sum in step 1. This is the Bayes 
decision. 


In determining the Bayes decision, we have assumed a prior distribution z (6) for the states of nature 
{0;}. Naturally the question arises: Can there be information or observations that will help us to 
determine z (0)? 


Definition 11.5.1 Observations that can aid us in determining the relative likelihoods of the possible states 
of nature are called observables. 


We remark that observables enable us to refine and update our initial prior 7(6). The updated prior 
is the conditional distribution (6|observables), which clearly depends on the observables as well as 
the initial prior (6). The updated prior is also called the posterior. 


For example, to determine the nature of weather we may hear the weather forecast (80% chance of 
rain), in which case we may assume P(G) = 0.2, and P(B) = 0.8. However, the weather forecast is 
not perfect. Let G and B denote the meteorologist’s prediction. We may like to know P(G|G) and 
P(G|B). That is, what is the probability of the weather being good when the meteorologist predicts 
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the weather will be good, and what is the probability that the weather is good when the meteorologist 
predicts the weather will be bad? 


It may be noted that there is no direct cause-effect relation in G|G. That is, the prediction of the 
weather forecast does not influence the weather. If a probability distribution depends on a set of 
parameters 0, the classical approach estimates 6 on the basis of an observed sample Xj,..., Xn. 
The samples X,,..., X, are the observables. Thus, observables are used to estimate the parame- 
ters, that is, we want the distribution of 6 given X),..., X, or p(@|X1,...,X,). In our weather 
situation, the observable is the weather forecast, whereas the parameter is one of the weather 
conditions, good or bad. In P(G|G) we are asking, “Given that the weather is good, what is 
the probability that the weather forecast is correct?” We can imagine that meteorological con- 


ditions such as the barometric pressure determine the weather (that is, G = f(m1,...,mx), 
m; = meterological factor), and in this sense we can consider that G is a parameter. We thus want 
P(G|G). 


To compute the posterior P(G|G), we use the Bayes theorem (which needs a prior distribution, P(G)). 
That is, 


P(G|G) = — FACIGIING) 
P(G|G)P(G) + P(G|B)P(B) 


Similarly, we can compute P(BIB). 


Coming back to our weather situation, if P(G) is known and P(G|G), P(B|B) are known, we could 
obtain the required posterior distributions P(G|G) and P(B|B). We can now use this distribution to 
calculate the expected utilities and choose the decision that maximizes the expected utility. 


We now consider an example. 


©, 


Example 11.5.2 
Let us initially assume P(O = 1) = P(Q=0) = 5. That is, 


1 
P (good weather) = P (bad weather) = = 


Suppose we have the following record on the meteorologist’s predictions. The meteorologist predicts good 
weather (G), given the weather is good, 3 of the time, that is, P(G|G) = 2/3, and predicts bad weather, 
given the weather is bad, 3/4 of the time, that is, P(B|B) = 3/4. Thus, given that the meteorologist 
predicts good weather, what is the probability that the weather will turn out to be good, and given the 
meteorologist predicts bad weather, what is the probability that the weather will turn out to be bad? 


Solution 

To compute the true probabilities, we use the Bayes theorem. 

We are given P(G|G) = % and P(B|B) = 3, which imply P(B|G) = } and P(G|B) = §. Using the Bayes 
theorem, we obtain the likelihood of G as 
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P(G|G)P(G) 


GIs P(G |G)P(G) + P(G |B) P(B) 
_ GG) _s 
~—6(3)G)+@)G) ” 
and the likelihood of B is 
peaidy = P(B|B)P(B) 


~ P(B|B)P(B) + P(B|G) P(G) 


3 1 
_ GG) _» 
~ (3\(1 1\(1)\) 13° 
(3) (2) + (3) @) 
Thus, we have the following updated prior depending upon the meteorologist’s prediction. The updated prior 
when the meteorologist predicts good weather is 


A 8 3 
m(G) = P(G|G) = i m(B) =1-—2(G)= Th 


Thus, the updated x (G) is actually me (G). Similarly, the updated prior when the meteorologist predicts bad 
weather (that is, 1%(G)) is 


is 4 - 9 
= P(G|B) = —; 2(B) = P(B|B) = —. 
mG) (G|B) 3 m(B) (BIB) 3 
That is, if the meteorologist predicts good weather, he will be right about 72.7% of the time, and if he predicts 


bad weather, he will be right about 69.2% of the time. 
| 


—oeoererererererernrerererererereeeeeeeeeeeee———— nn aay 
Example 11.5.3 
Consider Example 11.5.2, with the additional information that the meteorologist has predicted that the 
weather will be good ona given weekend. Referring to the utility table (Table 11.5) given in Example 11.5.1, 
we ask, what should be our decision—to insure or not to insure—in light of this prediction? 


Solution 
From Example 11.5.2, we know that the updated prior, given that the meteorologist predicts good weather, is 


- 8 . 3 
m(G) = P(G|G) = aa and n(B) = P(B|G) = rn 


Using the foregoing prior and the utility table in Example 11.5.2, we can compute the following expected 
gains: 


Expected gain if we insure = (125)2(G) + (135)2(B) 


8 3 
= (125)— + (135)— = 127.73. 
(2b) ee) a 
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and 
fd ; 8 
Expected gain if we do not insure = (200) 5 = 145.45. 


Therefore our decision, given that the meteorologist predicts good weather, is not to insure. 


EXERCISES 11.5 


11.5.1. Suppose that we will receive $25 if we get two consecutive heads (H) on two flips of a 
balanced coin. If only one head appears, we will get $10. On the other hand, if there is no 
heads, we will lose $15. If monetary return is the only concern, should we play this game? 
Why? 


11.5.2. In the previous problem, suppose we suspect the coin is not balanced. We feel that P(H) 
is only 0.4. In our last 10 observations, we counted three heads and seven tails. Should we 
play the game? Defend your answer. 


11.5.3. The owner ofa small structural engineering firm in Tampa wants to open a new branch office 
in Orlando. The single most influential factor is the projected state of the economy for the 
next 4 years. If the economy keeps expanding or at least does not take a turn for the worse, 
the owner expects an annual profit of $300,000 by opening the new office. If the economy 
experiences a downward trend, then the owner forecasts an annual loss of $200,000. If he 
just continues to operate his business in Tampa, he expects a $50,000 annual profit. Suppose 
a government forecast indicates that there is a 70% chance of economic expansion or status 
quo in the next 4 years and there is a 30% chance that the economy will show a decline. 
What is the optimal decision in this problem? Did you make any assumption in obtaining 
this optimal decision? 


11.5.4. In Exercise 11.5.3, suppose the owner decides to look at the accuracy of past forecasts by the 
government. Suppose his study indicates that a forecast of economic expansion came true 
only 2/3 of the time, whereas an economic downturn came true 4/5 of the time. Now based 
on this new evidence, what is the optimal option for the owner? 


11.5.5. Consider the weather Example 11.5.1, discussed earlier. The meteorologist’s prediction 
record over the past 15 days is as follows: 


Weather 
person’s G|B;/B|G/G/G|B|G/G|B|B|G|B;/G/G 
prediction 


How the 
weather 
turned out 
to be 
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(a) Assuming a uniform distribution for the states of nature, obtain an updated prior 
(posterior) based on the meteorologist’s record. 
(b) Obtain the Bayes decision. 


11.5.6. Acoin (not necessarily fair) will be tossed once, and you have to predict the outcome. If you 

predict the outcome correctly you win $1000. Otherwise, you lose $5. 

(a) What are the states of nature? What is the decision space? Write the utility table. 

(b) Suppose that you believe that the probability of heads is 2/3. What is your price for the 
states of nature? Find the expected gains. 

(c) Suppose that you are allowed to toss the coin twice and you find that the first toss results 
in heads and the second in tails. What are the observables? 

(d) Assume the situation in (c). The coin is going to be tossed again and you have to predict 
the outcome. What is your updated prior? 

(e) What are your expected gains, and what is your decision for the situation in (d)? 


11.5.7. We are given the following utility table: 


States of nature 
0 0 63 
d, | O | 10 
dy |—2| 5 1 


Determine the Bayes decision assuming a uniform prior for the states of nature. 


11.5.8. Suppose that we have an observable X that can take only two values, X; and X2, for the 
situation in Exercise 11.5.7. The distribution of X depends on the states of nature and is as 
follows: 


01 6 63 
X; | 0.1 | 0.5 | 0.6 
X2 | 0.9 | 0.5 | 0.4 


That is, P(X = x1|6;) = 0.1 or P(X = x2|63) = 0.4, and so forth. 
Suppose you observe X 1; what is the updated prior? What is the Bayes decision? 


11.5.9. A large lot has p% defectives and you have to predict p. If you predict p correctly you gain 

$g, and if the prediction is wrong, you lose $/. It is known that the possible values of p are 

P1s P21 +++ Dke 

(a) Set up a utility table. 

(b) Suppose you assume a uniform prior for p. That is z (pj) = is i=1,2,...,k. Findan 
expression for the Bayes decision. 

(c) Suppose you have an observable X such that P(X = x;|pj) = aj,i = 1,2...,k and 
P(X =x1|pj) =1—a;,i=1,2,...,k. Find the updated prior for p. What is the Bayes 
decision in this case? 
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11.6 CHAPTER SUMMARY 


In this chapter we introduced the basic philosophy, definitions, and methods of performing statistical 
analysis in a Bayesian setting. The treatment of unknown parameters as if they are random variables 
provides a feedback mechanism to update our original beliefs about the parameter(s). The posterior 
distribution of the parameter(s) represents our revised belief and is calculated by combining data 
and prior knowledge. We also saw a brief explanation of Bayesian decision theory. It should be noted 
that there are various other aspects of Bayesian analysis, such as Bayesian regression, in which priors 
are used about the regression coefficients as well as about the error variance. It is beyond the scope 
of one chapter to deal with all aspects of Bayesian analysis. There are many publications on Bayesian 
statistics. We have also briefly studied some elements of decision theory, which has a natural base in 
the Bayesian approach. 


We now list some of the key definitions introduced in this chapter: 


a Posterior distribution 
Quadratic loss function 
Absolute error loss function 
100 (1 — a) % credible interval 
Prior odds ratio 

Posterior odds ratio 
Observable 


In this chapter, we have also learned the following important concepts and procedures: 


Bayesian parameter estimation procedure 
Bayesian credible interval procedure 
General decision theory procedure 


a 
a 
a 
= Procedure to find optimal decision 


11.7 COMPUTER EXAMPLES 


A very popular software (and it is free) for the Bayesian computation is WinBUGS, which can be 
obtained from http://www.mrc-bsu.cam.ac.uk/bugs/. Computing posterior probability for propor- 
tions using the steps we learned in Section 11.2 can be performed using Minitab. Refer to the book, 
Bayesian Computation Using Minitab, by Jim Albert (Wadsworth, 1996). 


PROJECTS FOR CHAPTER 11 
11A. Predicting Future Observations 


Suppose we want to predict the value of future observations based on the prior and observed data. In 
addition to the posterior distribution f (|x), in Bayesian statistics we are interested in the marginal 
density of the observations (note that because both @ and x are random, it makes sense to speak about 
their joint, marginal, and conditional densities). Using the Bayes theorem, we have seen that g (x) is 
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the marginal density function of data at x = (11,..., xn) (for the continuous case) to be 
8 (x) = [ Fe) 2@a0 
where f (x |@) z (0) is the joint density of x and 6. This also can be written as 


s@=Elf al], 


the expected density of observations with respect to the prior distribution z (6). With the help of 
g (x), we can predict observations. 


We are more interested in the density of future observations y, given present data x. However, because 
we have already updated the value of @ using the posterior density, this should be reflected in our 
prediction: 


76io= / f(y, lx) dé 
= 7; F 18,2) - Ola) do 
= [ £01 x O10 40, 


if y and x are conditionally independent given 0. Conditional independence is achieved, for example, 
when x = (x1,...,Xn)/ and y = (%n41,...,Xn+m)’ both are samples from f (x|6). 


We see that the density of future observations is the expected density of observations with respect to 
posterior distribution. Consider two different priors for 0. 


Uniform [0,2], (2) N (1, z). Assume f (x|9) ~ N (0, 1). Find the predictive distributions given the 
sample X1, X2,..., Xp. 
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Nonparametric Tests 


Objective: In this chapter we shall introduce several classical Nonparametric or distribution free tests. 
These tests do not require distributional assumptions about the population such as the normality. 
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Jacob Wolfowitz 


(Source: www-groups.dcs.st-and.ac.uk/~history/Mathematicians/Wolfowitz. html) 


Jacob Wolfowitz was born on 19 March 1910 in Warsaw, Russian Empire (now Poland), and died 
on 16 July 1981 in Tampa, Florida, USA. Wolfowitz's earliest interest was nonparametric inference, 
and the first joint paper he wrote with Abraham Wald introduced methods of calculating confidence 
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intervals that are not necessarily of fixed width. It is in this paper by Wolfowitz in 1942 that the 
term nonparametric appears for the first time. Later, he worked on the area of sequential analysis 
and published work on sequential estimators of a Bernoulli parameter and results on the efficiency 
of certain sequential estimators. He also studied asymptotic statistical theory and worked on many 
aspects of the maximum likelihood method. Information theory pioneered by Shannon was another 
area to which Wolfowitz made important contributions, culminating in a classic book titled Coding 
Theorems of Information Theory (3rd ed. 1978). After working at different places such as the Statistical 
Research Group at Columbia University, the University of North Carolina, and the University of 
Illinois at Urbana, in 1978 he joined the faculty of the University of South Florida at Tampa. Wolfowitz 
was elected to the National Academy of Sciences and the American Academy of Arts and Sciences. 
He was also elected a Fellow of the Econometric Society, the International Statistics Institute, and the 
Institute of Mathematical Statistics. In 1979 he was Shannon Lecturer of the Institute of Electrical and 
Electronic Engineers. 


12.1 INTRODUCTION 


Most of the tests that we have learned up to this point are based on the assumption that the sam- 
ple(s) came from a normal population, or at the least that the population probability distribution(s) 
is specified except for a set of free parameters. Such tests are called parametric tests. In general, a 
parametric test is known to be generally more powerful than other procedures when the underlying 
assumptions are met. Usually the assumption of normality or any other distributional assumption 
about the population is hard to verify, especially when the sample sizes are small or the data are mea- 
sured on an ordinal scale such as the letter grades of a student, in which case we do not have a precise 
measurement. For example, incidence rates of rare diseases, data from gene-expression microarrays, 
and the number of car accidents in a given time interval are not normally distributed. Nonparametric 
tests are tests that do not make such distributional assumptions, particularly the usual assumption of 
normality. In situations where a distributional model for a set of data is unavailable, nonparametric 
tests are ideal. Even ifthe data are distributed normally, nonparametric methods are frequently almost 
as powerful as parametric methods. These tests involve only order relationships among observations 
and are based on ranks of the variables and analyzing the ranks instead of the original values. Non- 
parametric methods include tests that do not involve population parameters at all, such as testing 
whether the population is normal. Distribution-free tests generally do make some weak assumptions, 
such as equality of population variances and/or the distribution, and are of the continuous type. 


Sometimes we may be required to make inferences about models that are difficult to parameterize, or 
we may have data in a form that make, say, the normal theory, tests unsuitable. For example, incomes 
of families generally follow a very skewed distribution. If we do a sample survey of a large number 
of the families in a feeder area, the income distribution may look as in Figure 12.1. 


This distribution is clearly difficult to parameterize, that is, to identify a classical probability distri- 
bution that will characterize the data’s behavior. Moreover, the mean income of this sample may be 
misleading. A better measure of the central tendency is the median income. At least we know that 50% 
of the families are below the median and 50% above. Appropriate techniques of inference in these 
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Full time 


Unemployed 


Part time 


People with 
large assets 


Frequency 


Income 


W@ FIGURE 12.1 Income distribution of families. 


situations are based on distribution-free methods. Most of the nonparametric methods use only the 
order of magnitude of observations, known as order statistics, in a sample, rather than the observed 
values of the random variables. 


In general, nonparametric methods are appropriate to estimation or hypothesis testing problems 
when the population distributions could only be specified in general terms. The conditions may be 
specified as being continuous, symmetric, or identical, differing only in median or mean 3. 


The distributions need not belong to specific families such as normal or gamma. Because most of 
the nonparametric procedures depend on a minimum number of assumptions, the chance of their 
being improperly used is relatively small. Also, nonparametric procedures may be used when the 
data are measured on a weak scale such as only count data or rank data. We may ask: Why not 
use nonparametric methods all the time? The answer lies in the fact that when the assumptions 
of the parametric tests can be verified as true, parametric tests are generally more powerful than 
nonparametric tests. Because only ranks are used in nonparametric methods, and even though 
the ranks preserve information about the order of the data, because the actual values are not used 
some information is lost. Because of this, nonparametric procedures cannot be as powerful as their 
parametric counterparts when parametric tests can be used. For brevity and clarity, this chapter is 
presented without much theoretical explanation to focus on the methods. Theoretical developments 
can be found in many specialized books on the subject. 


In this chapter, we study some of the commonly used classical nonparametric methods that are based 
on ordering, ranking, and permutations. The modern approaches are based on resampling methods 
such as bootstrap and will be discussed in Chapter 13. 


12.2 NONPARAMETRIC CONFIDENCE INTERVAL 


We have seen that for a large sample, using the Central Limit Theorem, we can obtain a confidence 
interval for a parameter within a well-defined probability distribution. However, for small samples, 
we need to make distributional assumptions that are often difficult to verify. For this reason, in practice 
it is often advisable to construct confidence intervals or interval estimates of population quantities 
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that are not parameters of a particular family of distributions. In a nonparametric setting, we need 
procedures where the sample statistics used have distributions that do not depend on the population 
distribution. The median is commonly used as a parameter in nonparametric settings. We assume 
that the population distribution is continuous. 


Let M denote the median of a distribution and X (assumed to be continuous) be any observation 
from that distribution. Then 


P(X <M) = POX M)= 5. 


This implies that, for a given random sample X1,..., X, from a population with median M, the 
distribution of the number of observations falling below M will follow a binomial distribution with 
parameters n and p= 7 irrespective of the population distribution. That is, let N~ be the number of 
observations less than M. Then the distribution of N~ is binomial with parameters n and p= 5 for 
a sample of size n. Hence, we can construct a confidence interval for the median using the binomial 
distribution. 


For a given probability value a, we can determine a and b such that 


rarzo=S(') (2) ey 


and 


If exact probabilities cannot be achieved, choose a and b such that the probabilities are as close 
as possible to the value of w/2. Furthermore, let X(1), X(2),..., X(a),---, X@),---, Xm) be the order 
statistics of X;,..., X, as in Figure 12.2. 


Then the population median will be above the order statistic, X 4), (5) 100% of the time and below the 
order statistic, X (a), (§)100% of the time. Hence, a (1 — a) 100% confidence interval for the median 
of a population distribution will be 

Xi@ <M <Xw). 


Xi Xa Xa) M Xo) Xin-1) Xn) 


Wi FIGURE 12.2 Ordered sample. 


12.2 Nonparametric Confidence Interval 603 


We can write this result as P(X(a) < M < Xj) =1-a. 


By dividing the upper and lower tail probabilities equally, we find that b = n + 1 — a. Therefore, the 
confidence interval becomes 


X(a) < M < X(n41-a)- 
In practice, a will be chosen so as to come as close to attaining 5 as possible. 


We can summarize the nonparametric procedure for finding the confidence interval for the population 
median as follows. 


PROCEDURE FOR FINDING (1—a) 100% CONFIDENCE INTERVAL FOR THE MEDIAN M 
For a sample of size n: 

1. Arrange the data in ascending order. 

2. From the binomial table with n and p = L find the value of a such that 


a a 
P(X <a)= 5 or nearest to a 


3. Setb=n+1—-a. 
4. Then the confidence interval is such that the lower limit is the ath value and the upper limit is the 
bth value of the observations in step 1. 
Assumptions: Population distribution is continuous; the sample is a simple random sample. 


We illustrate this four-step procedure with an example. 


I EEE 
Example 12.2.1 


In a large company, the following data represent a random sample of the ages of 20 employees. 
24 31 28 43 28 56 48 39 52 32 
38 49 51 49 62 33 41 58 63 56 
Construct a 95% confidence interval for the population median M of the ages of the employees of this 
company. 
Solution 
For a 95% confidence interval, a = 0.05. Hence, a/2 = 0.025. The ordered data are 
24 28 28 31 32 33 38 39 41 43 
48 49 49 51 52 56 56 58 62 63 


Looking at the binomial table with n= 20 and p= 4, we see that P(X < 5)=0.0207. Hence, a=5 comes 
closest to achieving a/2=0.025. Hence, in the ordered data, we should use the fifth observation, 32, for the 
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lower confidence limit and the 16th observation (n+ 1—a=21—5= 16), 56, for the upper confidence 
limit. Therefore, an approximate 95% confidence interval for M is 


32 <M <56. 


That is, we are at least 95% certain that the true median of the employee ages of this company will be 
greater than 32 and less than 56. 


oc 
Example 12.2.2 


A drug is suspected of causing an elevated heart rate in a certain group of high-risk patients. Twenty 
patients from this group were given the drug. The changes in heart rates were found to be as follows. 


—-1 8 5 10 2 12 #7 9 1 3 
4 6 4 20 11 2 -1 10 2 8 


Construct a 98% confidence interval for the mean change in heart rate. Can we assume that the population 
has a normal distribution? Interpret your answer. 


Solution 

First testing for normality, we get the probability plot shown in Figure 12.3. 

This shows that the normality assumption may not be satisfied, and thus the nonparametric method is more 
suitable (this conclusion is based strictly on the normal probability plot). Using a box plot, we could also test 
for outliers. The ordered data are 


-? -? 122 2 3 4 4 5 
6 7 8 8&6 9 10 10 11 12 20 


Normal probability plot for heart rate 


ML Estimates 
Mean: 6.1 
Std Dev: 4.97896 


Percent 


mm FIGURE 12.3 
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Looking at the binomial table with n= 20 and p= . we see that P(X < 4) =0.006. Hence, a=4 comes 
closest to achieving a/2 =0.01. Hence, in the ordered data, we should use the fourth observation, 2, for the 
lower confidence limit and the 17th observation (n+ 1—a = 21 —4 = 17), 10, for the upper confidence 
limit. Therefore, an approximate 98% confidence interval for M is 


2<M < 10. 


That is, we are at least 98% certain that the true median of the mean change in heart rate will be greater 
than 2 and less than 10. 
If we perform the usual t-test, we will get the 98% confidence interval as (3.20, 9.0). However, such an interval 
is not valid, because the normality assumptions are not satisfied. 

= 


EXERCISES 12.2 


12.2.1. For the following random sample values, construct a 95% confidence interval for the 
population median M: 


7.2 5.7 49 62 85 2.7 5.9 60 8.2 


12.2.2. The following data represent a random sample of end-of-year bonuses for the lower-level 
managerial personnel employed by a large firm. Bonuses are expressed in percentage of 
yearly salary. 


6.2 92 8.0 7.7 84 91 7.4 6.7 86 6.9 
89 100 9.4 88 12.0 9.9 11.7 9.8 3.2 4.6 


Construct a 98% confidence interval for the median bonus expressed in percentage of 
yearly salary of this firm. Also, draw a probability plot and test for normality. Can this be 
considered a random sample? 


12.2.3. Air pollution in large U.S. cities is monitored to see if it conforms to requirements set by 
the Environmental Protection Agency. The following data, expressed as an air pollution 
index, give the air quality of a city for 10 randomly selected days. 


57.3 58.1 58.7 66.7 58.6 61.9 59.0 64.4 62.6 64.9 


(a) Draw a probability plot and test for normality. 
(b) Construct a 95% confidence interval for the actual median air pollution index for this 
city and interpret its meaning. 


12.2.4. Arandom sample from a population yields the following 25 values: 


90 87 121 96 106 107 89 107 83 92 
17 93 «98 «=©120 97 109 78 87 99 79 
104. 85 91 107 89 


Give a 99% confidence interval for the population median. 
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12.2.5. In an experiment on the uptake of solutes by liver cells, a researcher found that six deter- 
minations of the radiation, measured in counts per minute after 20 minutes of immersion, 
were: 


2728 2585 2769 2662 2876 2777 


Construct a 99% confidence interval for the population median and interpret its meaning. 


12.2.6. The nominal resistance of a wire is 0.20 ohm. A testing of the wire randomly chosen from 
a large collection of such wires yields the following resistance data. 


0.199 0.211 0.198 0.201 0.197 0.200 0.198 0.208 


Obtain a 95% confidence interval for the population median. 


12.2.7. In order to measure the effectiveness of a new procedure for pruning grapes, 15 workers 
are assigned to prune an acre of grapes. The effectiveness is measured in worker-hours per 
acre for each person. 


5.2 50 48 45 3.9 61 42 44 55 5.8 
4.2 53 4.9 4.7 4.9 


Obtain a 99% confidence interval for the median time required to prune an acre of grapes 
for this procedure and interpret its meaning. 


12.2.8. The following data give the exercise capacity (in minutes) for 10 randomly chosen patients 
being treated for chronic heart failure. 


15 27 11 19 12 21 11 17 #13 22 


Obtain a 95% confidence interval for the median exercise capacity for patients being treated 
for chronic heart failure. 


12.2.9. The data given below refer to the in-state tuition costs (in dollars) of 15 randomly selected 
colleges from a list of the 100 best values in public colleges (source: Kiplinger’s Magazine, 
October 2000). 


3788 4065 2196 7360 5212 4137 4060 3956 
3975 7395 4058 3683 3999 3156 4354 


Obtain a 95% confidence interval for the median in-state tuition costs and interpret its 
meaning. 


12.3 NONPARAMETRIC HYPOTHESIS TESTS FOR ONE SAMPLE 


In this section, we study two popular tests for testing hypotheses about the population location, 
or median using the sign test and the Wilcoxon signed rank test. The comparison of medians rather 
than means is a technicality that is not important unless the data are skewed substantially. In such 
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cases, medians are somewhat more accurate than means for comparing the locations of probability 
distributions. Further discussions on nonparametric tests can be found in many references, such as 
those by W. J. Conover and by E. L. Lehmann. Before using nonparametric tests, it is desirable to test 
for normality of the data using normal probability plots, and for the existence of outliers using box 
plots, and run tests for test of randomness of the data. When we make any particular choice of method, 
test for the assumptions made. These assumption checks are relatively easier using statistical software 
packages. Many of the examples in this chapter are given more for illustration of the nonparametric 
methods than for assumption violations of parametric tests or for comprehensive assumption testing 
techniques. Also, when we use statistical software packages, generally, the p-value of the test will be 
given in the output. In order to make a decision on a particular hypothesis, we just need to compare 
the p-value with the chosen value of a. We are going to explain a more traditional approach instead 
of using the p-value approach in the discussion; however the computer example section will illustrate 
the p-value approach. 


12.3.1 The Sign Test 


In this section, we describe a test that is the nonparametric alternative to the one-sample t-test and 
to the paired-sample t-test. Let M be the median of a certain population. Then we know that 


P(X < M) =0.5 = P(X > M). 


We consider the problem of testing the null hypothesis 


Ho:M=mo versus Hg: M>mo. 


Assume that the underlying population distribution is continuous so that P(X < M) = 0.5. Let X; 
be the ith observation and let Nt be the number of observations that are greater than mo. N* will 
be our test statistic. We will reject Ho if, n* the observed value of N*, is too large. This test is called 
the sign test. A test at significance level w will reject Ho if nt > k, where k is chosen such that 


P(N* > kwhen M = mo) =a. 


Similarly, if the alternative is of the form H,:M 4¢mo, the critical region is of the form Nt <k or 
Nt > ky, where P(N* <k) + P(Nt > ky) =a. 


In order to determine such ak and k1, we need to determine the distribution of N+. The test works on 
the principle that if the sample were to come from a population with a continuous distribution, then 
each of the observations falls above the median or below the median with probability ; Hence, the 
number of sample values falling below the median follows a binomial distribution with parameters 
nand p= $, n being the sample size. If a sample value equals the hypothesized median mo, that 
observation will be discarded and the sample size will be adjusted accordingly (we remark that 
such values should be very few). Thus, when Hp is true, N* will have a binomial distribution with 
parameters n and p= 7 For this reason, some authors call this test the binomial test. The following 
box summarizes the test procedure and the corresponding critical regions. 
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SIGN TEST 
Ho Vi Mo 

Alternative Hypothesis Critical Region 

fl iim\\ fl 
Ha:M > mo Nt >k, une My 

k fn\ (1\" 
Hag:M < mo Nt <k, where > ") (G) = 

n ;) a 

Ha:M#mo N* > ky, where x fg) = 

i= ky 


or 


k Te 
N+ <k, where >> | (5) sBGs 
j=) I 2 2 


If @ or w/2 cannot be achieved exactly, choose k (or k and k;) so that the probability comes as close to a (or 
a/2) as possible. 


We now summarize the procedure of the sign test in the case of an upper tail alternative. The other 
two cases are similar. 


HYPOTHESIS TESTING PROCEDURE BY SIGN TEST 
We test 


Ho:M=mo vs. H1:M> mo. 


1. Replace each value of the observation that is greater than mo by a plus sign and each sample value 
less than mo by a minus sign. If the sample value is equal to mo, discard the observation and adjust 


the sample size n accordingly. 
2. Letn* be the number of +’s in the sample. For n and p = i, from the binomial table, find 


y =P(Nt >n*). 


3. Decision: If y is less than a, Ho must be rejected. Based on the sample, we will conclude that the 
median of the population is greater than mg at the significance level a. Otherwise do not reject Ho. 


Assumptions: The population distribution is continuous. The number of ties is small (less than 10% of the 
sample). 
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Note that the approach described in the foregoing procedure is nothing but the p-value method for 
hypothesis testing regarding a median using the sign test. Recall that the p-value is the probability 
of observing a test statistic as extreme or more extreme than what was really observed, under the 
assumption that the null hypothesis is true. In the sign test, we had assumed that the median is 
M = mo, so 50% of the data should be less than mop and 50% of the data greater than mg. Thus, we 
expect half of the data to result in plus signs and half to result in minus signs. Hence, we can think of 
the data as following a binomial distribution with p = 1/2 under the null hypothesis. The p-value is 
computed from its definition given by the formula 


p-value = P(Nt >n*) = a @ (5) =y. 
i 


t=k 


The p-value method is to reject the null hypothesis if the computed p-value is greater than aw. These 
binomial probabilities can be obtained from the binomial tables, or statistical software packages. The 
following example illustrates how we apply the three-step procedure. 


3A} A A | 


Example 12.3.1 
For the given data from an experiment 


1.51 135 1.69 148 1.29 1.27) 154 1.39 1.45 


test the hypothesis that Hp: M = 1.4 versus Hg: M>1.4ata = 0.05. 


Solution 
We test 


Ho: M=1.4 versus Ha: M > 1.4. 


Replacing each value greater than 1.4 with a plus sign and each value less than 1.4 with a minus sign, we 
have 


+-+4+--4+-+4. 


Thus, nt = 5. From the binomial table with n = 9 and p= i, we have 
P(Nt > 5) = 0.50. 


Thus, the p-value is 0.5. Because a = 0.05 < 0.50, the null hypothesis is not rejected. We conclude that the 
median does not exceed 1.4. 
= 


When the sample size n is large, we can apply the normal approximation to the binomial distribu- 
tion. That is, the test statistic N+ is approximately normally distributed. Thus, under Hp, N* will 
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have approximate normal distribution with mean np = 5 and variance of np (1 — p) = 4. By the 
z-transform, we have 


Nt —n/2__ 2Nt - 
Jn/4 7 Jn 


We could utilize this test ifn is large, that is, ifnp > 5 and n(1 — p) => 5. Hence, under Ho, because 
p = 1/2, ifn > 10, we could use the large sample test. The following table summarizes the large 
sample sign test. 


n 
L= ~ N(O, 1). 


A SIGN TEST FOR A LARGE RANDOM SAMPLE 
When the sample size is large (n > 10), we can use the normal approximation to a binomial. This leads to 
the large sample sign test: 


Ho :M=mo 
versus 
Alternative Hypothesis Rejection Region 
Hg: M > mo Zea 
Hg:M<mo Z<—-Zy 
Ha:M#mo Z| = Za/2 
The test statistic is 
oe 2N* —n 
=~ 


Decision: Reject Ho, if the test statistic falls in the rejection region, and conclude that Hg is true with 
(1 — w)100% confidence. Otherwise, do not reject Ho because there is not enough evidence to conclude 
that Hg is true for a given a, and more experiments are needed. 


Assumptions: (i) Population distribution is continuous. (ii) Sample size greater than or equal to 10 (after 
the removal of ties). (iii) The number of ties is small (less than 10% of the sample size). 


We illustrate this procedure with the following example. 


SSS <r 

Example 12.3.2 
In order to measure the effectiveness of a new procedure for pruning grapes, 15 workers are assigned to 
prune an acre of grapes. The effectiveness is measured in worker-hours/acre for each person. 

52 50 48 39 61 42 44 55 58 4.5 

42 53 49 47 49 
Test the null hypothesis that the median time to prune an acre of grapes with this method is 4.5 hours 
against the alternative that it is larger. Use a = 0.05. 
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Solution 
We test 


Ho: M=4.5 versus Ho: M > 4.5. 


Replacing each value greater than 4.5 with a plus sign and each value less than 4.5 with a minus sign, we 
have 


+4++—+--+4+-4+44++4. 


Because there is one observation that is equal to 4.5, we must discard it and taken = 14. 
Thus N+ = 10, using the large sample approximation, the test statistic is 


2Nt+—n 20-14 © 


We = Via = 1.6. 


For a = 0.05, from the standard normal table, the value of zg,95 = 1.645. Hence, the rejection region is 
Zz > 1.645. Because the observed value of the test statistic does not fall in the rejection region, we do not 
reject the null hypothesis at a = 0.05 and conclude that the median time to prune an acre of grapes is 4.5 
hours. 


Z= 


12.3.2 Wilcoxon Signed Rank Test 


In the sign test, we have considered only whether each observation is greater than mg or less than 
mo without giving any importance to the magnitude of the difference from mg. An improved version 
of the sign test is the Wilcoxon signed rank test, in which one replaces the observations by their 
ranks of the ordered magnitudes of differences, |x; — mo|. The smallest observation is ranked as 1, 
the next smallest will be 2, and so on. However, the Wilcoxon signed rank test requires an additional 
assumption that the continuous population distribution is symmetric with respect to its center. Thus, 
if the data are ordinal, the Wilcoxon test cannot be used. 


HYPOTHESIS TESTING PROCEDURE BY WILCOXON SIGNED RANK TEST 
We test 
Ho :M =mo versus H7 :M Amo. 


1. Compute the absolute differences z; = |x; — mo| for each observation. Replace each value of the 
observation that is greater than mo by a plus sign and each sample value that is less than mo by a 
minus sign. If the sample value is equal to mg, discard the observation and adjust the sample size 
n accordingly. 

2. Assign each z; a value equal to its rank. If two values of z; are equal, assign each z; a rank equal to 
the average of ranks each should receive if there were not a tie. 

3. Let W* be the sum of the ranks associated with plus signs and W~ be the sums of ranks with 
negative signs. 
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4. Decision: If mo is the true median, then the observations should be evenly distributed about mo. 
For a size a critical region, reject Ho if 


W*+ <c, where P(W* <c;) = = 
or 
W* > cp, where P(W* > c>) = = 


Assumptions: The population distribution is continuous and symmetrical. The number of ties is small, 
less than 10% of the sample size. 


The exact distribution of W* is considerably complicated and we will not derive it. However, for 
certain values of n, the distribution is given in the Wilcoxon signed rank test table. 


For the Wilcoxon signed rank test, the rejections region based on the alternative hypothesis is given 
next. 


For 

Ha: M > mo, rejection region is wt >c, where P(wt >c)=a, 
and for 

Ha: M < mo, rejection region is wt <c, where P(wt <c)=a. 


We illustrate the Wilcoxon signed rank test with the following examples. 


| 


Example 12.3.3 
For the given data that resulted from an experiment 


1.51 1.35 169 148 1.29 1.27 1.54 1.39 1.45 


test the hypothesis that Ho : M = 1.4 versus Hy : M 4 1.4.Usea = 0.05. 


Solution 
We test 


Ho: M=1.4 versus Ha: M # 1.4. 


Here, a= 0.05, and mg = 1.4. The results of steps 1 to 3 are given in Table 12.1. 

Thus, we have Wt = 29 and n = 9. From the Wilcoxon signed-rank test table in the appendix, we should 
reject Ho if Wt <6 or Wt > 38 with actual size of a = 0.054. Because W+ = 29 does not fall in the 
rejection region, we do not reject the null hypothesis that M = 1.4. 
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Table 12.1 
x w= |xi—1.4| Sign Rank 

1.51 0.11 + 5.5 
1.35 0.05 _ 3 
1.69 0.29 + 9 
1.48 0.08 + 4 
1.29 0.11 _ 5.5 
1.27 0.13 - 7 
1.54 0.14 + 8 
1.39 0.01 _ 1.5 
1.45 0.01 + 1.5 


Example 12.3.4 

Air pollution in large U.S. cities is monitored to see whether it conforms to requirements set by the Environ- 
mental Protection Agency. The following data, expressed as an air pollution index, give the air quality of a 
city for 10 randomly selected days. 


57.3 58.1 58.7 66.7 586 61.9 59.0 644 626 64.9 
Test the hypothesis that Hp : M = 65 versus Hy : M < 65.Usea = 0.05. 


Solution 
We test 


Ho: M=65 versus Ha: M < 65. 


Here, a = 0.05, and mg = 65. 

The results of steps 1 to 3 are given in Table 12.2. 

Thus, Wt = 3, and n=10. Using the Wilcoxon signed rank test table, we should reject Ho if Wt < 10 with 

actual size of a= 0.042. Because the observed value of W* falls in the rejection region, we reject Ho and 

conclude that the sample evidence suggests that we conclude the median air pollution index is less than 65. 
= 


The Wilcoxon signed rank test is a nonparametric alternative to the one-sample t-test. The question 
then is, how do we decide which one to choose? Choose the one-sample f-test if it is reasonable 
to assume that the population follows a normal distribution. Otherwise, choose the Wilcoxon non- 
parametric test. However, the Wilcoxon test will have less power. For example, a normal probability 
plot of the data of Example 12.3.4 is given in Figure 12.4. Looking at this figure, we can see that the 
normality assumption is a suspect. It may make more sense to use the nonparametric method. 
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Table 12.2 
oH zi = |x; — 65| Sign Rank 

57.3 77 = 10 
58.1 6.9 = 9 
58.7 6.3 = 8 
66.7 1.7 + 3 
58.8 6.2 = 7 
61.9 4.1 = 5 
59.0 6.0 = 6 
64.4 0.6 = 2 
62.6 24 = 4 
64.9 0.1 = 1 


Normal probability plot 


Probability 


Index 
Average: 61.22 Kolmogorov-Smirnov Normality Test 
Std Dev: 3.32158 D+: 0.248 D—: 0.131 D: 0.248 
N: 10 Approximate P-Value: 0.081 


Wi FIGURE 12.4 Normal probability for air pollution index. 


When sample size n is sufficiently large, under the assumption of Ho being true, the distribution of 
W* is approximately normal with mean 


E(wt) = zn +1) 


12.3 Nonparametric Hypothesis Tests for One Sample 615 


and variance 


1)(2 1 
veiw jee Ue 
24 

Hence, the test statistic is given by 


wt qn(n +1) 
Z= 
Jn(n + 1)(2n + 1)/24 


which is approximately the standard normal distribution. This approximation can be used when 
n> 20. 


SUMMARY OF THE WILCOXON SIGNED RANK TEST FOR LARGE SAMPLES (N > 20) 


We test 
Ho :M=mo 
versus 
M > mo, upper tailed test 
Hg: M < mo, lower tailed test 
M 4 mo, two-tailed test. 
The test statistic: 


wt — LAG +1) 
a a 
n(n + 1)(2n + 1)/24 


Rejection region: 


Z>Zo upper tail RR 
Z <—Zy, lower tail RR 
[Z| > Za/2, two tail RR. 


Decision: Reject Ho, if the test statistic falls in the RR, and conclude that Hg is true with (1 — a)100% 
confidence. Otherwise, do not reject Hg, because there is not enough evidence to conclude that Hg is true 
for a given a and more experiments are needed. 

Assumptions: (i) The population distribution is continuous and symmetric about 0. (ii) Sample size is 
greater than or equal to 20. (iii) The number of ties is small, < 10% of the sample size. 


We illustrate the Wilcoxon signed rank test with the following example. 
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eee 
Example 12.3.5 
The following data give the monthly rents (in dollars) paid by a random sample of 25 households selected 
from a large city. 
425 960 1450 655 1025 750 670 975 660 880 
1250 780 870 930 550 575 425 900 525 1800 
545 840 765 950 1080 
Using the large sample Wilcoxon signed rank test, test the hypotheses that the median rent in this city is 
$750 against the alternative that it is higher with a = 0.05. 


Solution 
We test 
Ho: M=750 versus Hy : M > 750. 


Here a=0.05, and mg = 750. The results of steps 7 to 3 are given in Table 12.3 (where the asterisk indicates 


zj = 0). 

Table 12.3 
Xj Zi = |xj — 750] Sign Rank 
425 325 a 19.5 
960 210 + 15 

1450 700 + 23 
655 95 = 6 

1025 302 + 18 
750 0 * ignore 
670 80 = 3 
975 225 + 16.5 
660 90 = 45 
880 130 + 8 

1250 500 + 22 
780 30 + 2 
870 120 + 7 
930 180 + 11 
550 200 — 12.5 


(continued) 
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Table 12.3 (continued) 

Xi Zi = |xi — 750| Sign Rank 
575 175 - 10 
425 325 - 19.5 
900 150 + 9 
525 225 - 16.5 
1800 1050 + 24 
545 205 = 14 
840 90 + 4.5 
765 15 + 1 
950 200 + 12.5 
1080 330 + 21 


Here, forn = 24, Wt = 172.5, and the test statistic is 


wt - AG +1) 
v= 4 
J/n(n + 1)(2n + 1)/24 


172.5 — (z) (24)(25) 
= = 0.64286. 
(24)(25)(49) 


24 


For a = 0.05, the rejection region is z > 1.645. Because the observed value of the test statistic does not fall 
in the rejection region, we do not reject the null hypothesis. There is not enough evidence to conclude that 
the median rent in this city is more than $750. 

= 


The rank tests are useful for situations when you suspect that the data do not follow the normal 
population. It is important to note that ignoring the tied observations reduces the effective sample 
size, which in turn reduces the power of the test (see Example 7.1.4 for the effect of on the value of £). 
This loss is not significant if there are only a few ties. However, if the ties are 10% or more, hypothesis 
testing using rank tests becomes considerably conservative. That is, they yield error probabilities that 
are significantly high. 


12.3.3 Dependent Samples: Paired Comparison Tests 


The sign test and the Wilcoxon signed rank test can also be used for paired comparisons. The exper- 
imental procedure typically consists of taking “before” and “after” type or otherwise matched as in 
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the paired t-test case readings for each unit. Suppose there are n pairs of before and after observations 
and we are interested in testing the equality of the two medians. One way to test such observations is 
to consider the difference between the two observations for a unit to be a single observation on that 
unit. Thus, we can treat the sample as being n observations on a population of differences. For this 
new sample of differences, the testing problem becomes 


Ho: M=Oversus Ha: M > O(or M <0, or M #0). 


Hence, the basic procedure could be summarized to first find the difference between the two units for 
each of the observations, and then follow the testing procedures explained earlier for the sign test or 
the Wilcoxon signed rank test. Both small sample and large sample cases can be handled as before. 
In the following example, we illustrate this concept for a large sample sign test. 


LZ :0°0°0°0°0°°”?:>h .._._57Q°ciIuK—_::.. 
Example 12.3.6 
A dietary program claims that 3 months of its diet will reduce weight. In order to test this claim, a random 
sample of eight individuals who went through this program for 3 months is taken. The following table gives 
weight in pounds. 


Before 180 199 175 226 189 205 169 211 
After 172, 191 172 230 178 199 171 201 


Using a 5% significance level, is there evidence to conclude that the program really reduces the population 
median weight? 


Solution 
Let M denote the median of the population of difference of weights. We will use the difference as 
“after” —"before.” Then we will test 


Ho:M=0 versus Ha: M <0. 


We will use the large sample sign test. Replacing each value of the difference that is greater than zero by a 
+ sign and less than zero by a — sign, we have 


Difference 8 8 3/4 11 6 | 2 | —10 
Sign i | Ul Pte) |e |i 


Forn = 8 and Nt = 2, the test statistic is given by 


f= = —_— =- 1.414, 
vn V8 
For a= 0.05, Z9,95 = 1.645, and the rejection region is z < — 1.645. Because the observed value of the test 
statistic does not fall in the rejection region, we do not reject the null hypothesis. Thus, there is not enough 


evidence to conclude that the new program reduces the weight. 
[issai) 
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EXERCISES 12.3 


12.3.1. It was reported that the median interest rate on 30-year fixed mortgages in a certain large city 
is 7.75% on a particular day, with zero points. A random sample of nine lenders produced 
the following data of interest rates in percentage. 


7.625 7.375 8.00 7.50 7.875 8.00 7.625 7.75 7.25 


Test the hypothesis that the median interest rate in this city is different from 7.75%, using 
(a) the sign test, and (b) the Wilcoxon signed rank test. Use a = 0.01. Compare the two 
results. 


12.3.2. It is believed that a typical family spends 35% of its income on food and groceries. A sample 
of eight randomly selected families yielded the following data. 


30 29 39 49 36 33 37 35 


Test the hypothesis that the median percentage of family income spent for food and groceries 
is 35 against the alternative that it is less than 35. Use w = 0.05. 


12.3.3. The SAT scores (out of a maximum possible score of 1600) for a random sample of 10 
students who took this test recently are: 


1355 765 890 1089 986 1128 1157 1065 1224 567 


Test the hypothesis that the median SAT score is 1000 against the alternative that it is greater 
using a = 0.05. Use both the sign test and the Wilcoxon signed rank test. Explain if the 
conclusions are different. 


12.3.4. The regulatory board of health in a particular state specifies that the fluoride levels in water 
must not exceed 1.5 parts per million (ppm). The 20 measurements given here represent 
the randomly selected daily early morning readings on fluoride levels in water at a certain 


city. 
0.88 0.82 0.97 0.95 0.84 0.90 0.87 0.78 0.75 0.83 
0.71 0.92 1.11 0.81 0.97 0.85 0.97 0.91 0.78 0.87 


Test the hypothesis that the median fluoride level for this city is 0.90 against the alternative 
that the median is different from 0.9 at a = 0.01, using (a) the large sample sign test, and 
(b) the Wilcoxon signed rank test. Interpret the results. 


12.3.5. The following data give the weights (in pounds) for a random sample of 20 NFL players. 


285 178 311 276 192 232 259 189 298 211 
269 285 296 193 288 254 246 234 274 229 


Test the hypothesis that the median weight of NFL players is 250 pounds against the alter- 
native that it is greater at a = 0.05, using (a) the large sample sign test and (b) the Wilcoxon 
signed rank test. 
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12.3.6. The following data give the amount of money (in dollars) spent on textbooks by 18 students 
for the last academic year at a large university. 


510 425 190 298 157 260 320 615 455 
490 188 115 230 610 220 155 315 110 


Test the hypothesis that the median amount spent on books at this university is $325 against 
the alternative that it is different using the large-sample sign test. Use a = 0.05. 


12.3.7. It is desired to study the effect of a special diet on systolic blood pressure. The following 
sample data are obtained for eight adults over 40 years of age before and after 6 months of 
this diet. 


Before 185 222 235 198 224 197 228 234 
After 188 217 229 190 226 185 225 231 


At 95% confidence level, is there evidence to conclude that the new diet reduces the systolic 
blood pressure in individuals of over 40 years old? Test (a) using the sign test, and (b) using 
the Wilcoxon signed rank test. Interpret the results. 


12.3.8. In an effort to study the effect on absenteeism of having a day-care facility at the workplace 
for women with newborn babies (less than 1 year old), a large company compared the 
number of absent days for a year for seven women with newborn children before and after 
instituting a day-care facility. 


Before 20 18 35 22 17 24 15 
After 16 9 22 28 19 13 10 


At 99% confidence level, is there evidence to conclude that having a day-care facility at the 
workplace reduces absenteeism for women with newborn children? 


12.4 NONPARAMETRIC HYPOTHESIS TESTS FOR TWO INDEPENDENT 
SAMPLES 


In this section we learn how to test the equality of the medians of two independent samples from 
two populations. This is especially useful when one studies the treatment effects, such as the effect of 
a certain drug to treat a given medical condition when we have two groups—an experimental group 
and a control group—or the effect of a particular type of teaching method. We will describe the median 
test, which corresponds to the sign test, and the Wilcoxon rank sum test. 


12.4.1 Median Test 

Let m, and mz be the medians of two populations 1 and 2, respectively, both with continuous 
distributions. Assume that we have a random sample of size n; from population 1 and a random 
sample of size nz from population 2. The median test can be summarized as follows. 
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HYPOTHESIS TESTING PROCEDURE USING MEDIAN TEST 
We test 


m, > Mp, upper tailed test 
Ho :m1, =m) versus Ha:m < mp, lower tailed test 
m, 4 mp, two-tailed test. 

1. Combine the two samples into a single sample of size n; + nz, keeping track of each observation’s 
original population. Arrange the n; + n2 observations in increasing order and find the median of 
this combined sample. If the median is one of the sample values, discard those observations and 
adjust the sample size accordingly. 

2. Define Nj, to be the number of observations of a sample from population 1 (under Ho we would 
expect this number to be around nj /2). 

3. Decision: If Ho is true, then we would expect N;p to be equal to some number around nj /2. For 
Ha :m 1 > mp, rejection region is Njp < c, where P(N;p < c) =a, forHg:m; < mp, rejection 
region is Nyp > c, where P(Nyp > c) = a, and for Hg : m; # mp, rejection region is Nyp > cy, 
or Nip < cz, where 


P(N\p =) = 5 and P(N €@)= > 


Assumptions: (i) Population distribution is continuous. (ii) Samples are independent. 


Let m1 +n 2 = 2k. Under Ho, Nip has a hypergeometric distribution given by 


ny n2 
nip} \k—nyp 
ni nd 
k 


with the assumption that i ) = 0, if j > i. Note that the hypergeometric distribution is a discrete 


P(N\p =p) = 14g 0,15 2) oc.5 NA 


distribution that describes the number of “successes” in a sequence of n draws from a finite population 
without replacement. Thus, we can find the values of c, c}, and c2, required earlier. This calculation 
can be tedious. To overcome this, we can use the following large sample approximation valid for 
ny > 5 andnz > 5. First classify each observation as above or below the sample median as shown in 
Table 12.4. 


It can be verified that the expected value and variance of Niq are given by 


Nan Nanin2Np 


E(Niq) = 72-1)" 


and Var(Njq) = 
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Table 12.4 

Below Above Totals 
Sample 1 Mp Mia ny 
Sample 2 Nap N2a n2 
Total Np Na niytna=n 

Thus, for a large sample we can write 
Nia — EN 1a) 
z= ——_ ~ N(0, 1). 
J Var(Niq) 


Hence, we can follow the usual large sample rejection region procedure, which is summarized next. 


SUMMARY OF LARGE SAMPLE MEDIAN SUM TEST (n; > 5 AND np > 5) 


We test 
m, > m2, upper tailed test 
Ho :m1 = mp2 versus Ha :4 m, < mp, lower tailed test 
m, #mp, two-tailed test. 
The test statistic: 
Nig E(Nig) 
JVar(Nia) * 
where 
NaN 
E(Nia) = — 
n 
and 
Ninyn2Np 
Var (Nig) = ———_.. 
( 1a) n2(n =5) 
Rejection region: 
Zar upper tail RR 
Z<—Zq, lower tail RR 


IZ| > Zu/2 two tail RR. 
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Decision: Reject Ho, if the test statistic falls in the RR, and conclude that Hg is true with (1 — ~)100% 
confidence. Otherwise, do not reject Hg, because there is not enough evidence to conclude that Hg is true 
for a given a and more experiments are needed. 


Assumptions: (i) Population distributions are continuous. (ii)n; > 5andn 2 > 5. 


We illustrate this procedure with the following example. 


.-_$_>} AA 


Example 12.4.1 
Given below are the mileages (in thousands of miles) of two samples of automobile tires of two different 
brands, say | and Il, before they wear out. 


Tirel: 34 32 37 35 42 43 47 58 59 62 69 71 78 84 
Tirell: 39 48 54 65 70 76 87 90 111 118 126 127 
Use the median test to see whether the tire II gives more median mileage than tire |. Use w = 0.05. 


Solution 
We test 


Ho :m, =m versus Ho:m, < my. 


Because the sample size assumption is satisfied, we will use the large sample normal approximation. The 
results of steps 1 and 2, using the notation A for above the median and B for below the median, are given 


in Table 12.5. 

Table 12.5 

Sample values Population Above/below the median 
32 | B 
34 I B 
35 I B 
37 I B 
39 ll B 
42 | B 
43 I B 
47 | B 
48 ll B 


(continued) 
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Table 12.5 (continued) 


Sample values Population Above/below the median 


54 ll B 


58 | B 


59 | 


ive) 


62 | 


ive) 


65 ll 


69 | 


70 ll 


71 | 


76 ll 


78 | 


84 | 


87 ll 


90 ll 


111 ll 


118 ll 


126 ll 


De ee Pe ee ee 


127 ll 


The median is 63.5. Thus, we obtain Table 12.6. 


Table 12.6 

Below Above Totals 
Sample1 Ni,=10 Nig=4 nj =14 
Sample2 No, = 3 Nog =9 ny = 12 
Total Np =13 Nag=13 m1 +ng=n=26 
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Also, 


ENiqa= 


Nani _ (13)04) _ 
n 


26 
and 


Nanin2Np — (13)(13)(14)(12) 


Var(Niq) = = 
NaS er ais 16,900 


= 1.68. 


Hence, the test statistic is 


_ Nia-~EWia) 4-7 


= = —)005: 
JVar(Nig) 16,900 


For a= 0.05, zo.05 = 1.645. Hence, the rejection region is {z < —1.645}. Because the observed value 
of z does not fall in the rejection region, we do not reject Hp and conclude that there is not enough 
evidence to conclude that there is any difference in the median mileage for the two types of tires. 


12.4.2 The Wilcoxon Rank Sum Test 


The Wilcoxon rank sum test is used for comparing the medians of two independent populations, 
as in the two-sample t-test in the parametric case. For accurate results, it is necessary to assume 
that the variances of the populations are equal. This test is quite similar to the Wilcoxon signed 
rank test. Whereas the one-sample Wilcoxon signed rank test requires an additional assumption that 
the population distribution is symmetric, such an assumption is not necessary for the two-sample 
Wilcoxon rank sum test. This test can be applied for skewed distributions. The test is almost as 
powerful as the parametric version when the population distributions are close to normal. Many 
statistical software packages do not give the Wilcoxon rank sum test; instead the Mann-Whitney test 
is given. It should be noted that the Wilcoxon rank sum test is equivalent to the Mann-Whitney 
U-test. We will not separately describe the Mann-Whitney test; however, in practice just perform the 
Mann-Whitney test if the software has only that test. 


Assume that we have n; observations randomly sampled from population I and nz observations 
randomly sampled from population II with n; < nz. The Wilcoxon rank sum test procedure can be 
summarized as follows. 


HYPOTHESIS TESTING PROCEDURE BY WILCOXON RANK SUM TEST 
We test 


Ho :m 1 = mp versus Hy :m, # mp. 
1. Combine the two samples into a single sample of size n; + nz, keeping track of each observation’s 


original population. Arrange the n; + n2 observations in ascending order and assign ranks. 
2. Sum the ranks of observations from population II and call it R. 


1 
3. Let the test statistic be W = R — nee + 1). 
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4. Decision: If Ho is false, one would expect that the value of W would be very small or very large. For 
a size a critical region, reject Ho if 


W < cy, where P(W <c}) = = 

or 
a 
W => cp, where P(W > co) = = 


Note: The exact distribution of W is given in the Wilcoxon rank sum test table in the appendix for 
small values of n; and np. 


In the Wilcoxon rank sum test, based on the alternative hypothesis, we have the following rejection 
regions. 
For 


Hg :m, > mp2, rejection region is W > c, where P(W > c) =a, 


and for 


Hg: my, < mz, rejection region is W < c, where P(W <c) =a. 
We will illustrate the foregoing procedure with the following example. 


OO 


Example 12.4.2 
Comparison of the prices (in dollars) of two brands of similar automobile tires resulted in the data in 


Table 12.7. 


Table 12.7 


Tirel: 85 99 100 110 105 87 


Tirell: 67 69 70 93 105 90 110 115 


Use the Wilcoxon rank sum test with w = 0.05 to test the null hypothesis that the two population medians 
are the same against the alternative hypothesis that the population medians are different. 


Solution 
Here we need to test 


Ho :m, = mz versus Ha: my, #m?. 


The sample sizes are n; = 6, and nz = 8. Combining step 1 and step 2, we have the results shown in 
Table 12.8. 
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Table 12.8 

Value 67 69 70 85 87 90 93 99 100 105 105 110 110 115 
Population || | to oft ft io ou ol | | I I II 
Rank 1 2 3 4 5 6 7 8 9 105 105 12.5 12.5 14 


The sum of ranks of observations from population II is R = 56. Hence, the test statistic is 
1 
W=R- analna +1) 
1 
= 56—- 5 8)0) = 20. 


For a = 0.05, the rejection region is W < 9 or W > 38, with the actual a being 0.0592. Because the 
observed value of the test statistic does not fall in the rejection region, Ho is not rejected. Thus, we do not 
have enough evidence to conclude that the median prices are different for these two brands of automobile 
tires. 


When the sample sizes are large and when Ab is true, the distribution of the Wilcoxon rank sum test 
can be approximated by the normal distribution. It can be shown that under Hp, when both n, and 
ny are greater than 10, the distribution of W is approximately normal with 


nynz nynz(ny +n2 + 1) 


12 


E(W) = and Var (W) = 


For a large random sample, we can summarize the test procedure as follows. 


SUMMARY OF LARGE SAMPLE WILCOXON RANK SUM TEST (n; > 10 AND n2 > 10) 


We test 
m, >m2, upper tailed test 
Ho : m1 = mp2 versus Hg :4 m1 < m2, lower tailed test 
m #m), two-tailed test. 


The test statistic: 
7- W —njn2/2 
~ /nynz(ny +2 + 1)/12- 


Rejection region: 


Z>Za upper tail RR 
Zea lower tail RR 


IZ| > Zu/2 two tail RR. 
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Assumption: The samples are independent and n| > 10 and np > 10. 


Decision: Reject Ho, if the test statistic falls in the RR, and conclude that Hg is true with (1 — a)100% 
confidence. Otherwise, do not reject Ho, because there is not enough evidence to conclude that Hg is true 
for a given a and more data are needed. 


We will use the foregoing procedure to solve the following problem. 


OOOO —_.—n— — Eee 
Example 12.4.3 
In an effort to determine the immunoglobulin D (IgD) levels of a certain ethnic group, a large number of 
blood samples representing both sexes for 12-year-olds were taken. The following sample data give the 
IgD levels (in mg/100 mL). 


Male: | 9.3 | 0.0 | 12.2 | 8.1 | 5.7 | 6.8 | 3.6 | 9.4 | 8.5 | 7.3 | 9.7 
Female: | 7.1 | 0.0 | 5.9 | 7.6| 2.8] 5.8 | 7.2 | 7.4 | 3.5 | 3.3 | 7.5 | 7.0 


Use the large sample Wilcoxon rank sum test with the significance level a = 0.01 to test the hypothesis 
that there is no difference between the sexes in the median level of IgD. 


Solution 
We need to test 


Ho :m, =m versus Hg:m, #mp). 


Here, ny = 11, and nz = 12, and the results of step 1 and step 2 are given in Table 12.9, where we use M 
or F to identify the population from which the data are coming. 
The sum of the ranks for females is R = 114.5, and 


1 
W=R- gn2(ne +1) 


1 
= 114.5 — 5(12)(13) = 365. 


Table 12.9 


Value 0 0 28 #33 35 36 57 58 59 68 7 


MorF M F F F F M M F F M&F 


Rank 15°15 3 4 ) 6 7 8 9 10 #11 


Value 7.2 73 74 75 76 81 85 93 94 97 12.2 


MorF F M F F F M M M M M M 


Rank 13 14 #15 #16 #17 #18 +419 20 21 22 23 
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Therefore, the test statistic results in 


W —nyjn2/2 


Z= 
Jnynz(ny +n2 + 1)/12 


= 208 CE 08 18), 
V0 02)24)/12 
For a = 0.01, we have zw/2 = 20.005 = 2.575. Hence, the rejection region is z < —2.575 or z > 2.575. 
Because the test statistic does not fall in the rejection region, we do not reject Hg at a = 0.01 and conclude 
that there is not enough evidence to conclude that there is any difference between the sexes in the median 
level of IgD. 


With a slight modification of the ranking system in the Wilcoxon rank sum test, we could test for the 
equality of variances when the normality assumption of the F-test fails. 


EXERCISES 12.4 


12.4.1. The following data give the winning proportions of the top six football teams from each 
of the two conferences of the NEL. 


American Conference | 0.818 | 0.727 | 0.909 | 0.818 | 0.727 | 0.545 
National Conference | 0.636 | 0.545 | 0.636 | 0.636 | 0.818 | 0.455 


Use the Wilcoxon rank sum test at the significance level of 0.05 to test the null hypothesis 
that the two samples contain populations with identical medians against the alternative 
hypothesis that the medians are not equal. State any assumptions you have made to solve 
the problem. 


12.4.2. Comparison of two protective methods against corrosion yielded the following maximum 
depths of pits (in thousandths of an inch) in pieces of similar metals subjected to the 
respective treatments: 


Method I: | 68 | 75 | 69 | 75 | 70 | 69 | 72 
Method II: | 61 | 65 | 57 | 63 | 58 


Use the Wilcoxon rank sum test at the significance level of 0.01 to test the null hypothesis 
that the two samples have identical medians against the alternative hypothesis that the 
medians are not equal. 


12.4.3. Show that when Ho is true, the mean and variance of the Wilcoxon rank sum test with 
sample sizes n; and n2 are 


nyng nynz(ny +n2 4+ 1) 


A 12 


and Var(W) = 


630 CHAPTER 12 Nonparametric Tests 


12.4.4. In order to make inferences about the temporal muscles of the cat, a certain dose of tubocu- 
rarine is injected into a random sample of nine cats. The following data give the tetanus 
frequency (in hertz) in the temporal (T) muscles before and after injection of tubocurarine. 


T before | 24 | 33 | 27 | 23 | 31 | 28 | 31 | 24) 19 
T after 27 | 38 | 34 | 32 | 37 | 28 | 35 | 28 | 41 


Use the Wilcoxon rank sum test at the significance level of 0.05 to test the null hypothesis 
that the median tetanus frequency (in hertz) in the temporal (T) muscles is larger after 
injection of tubocurarine. State any assumptions you made to solve the problem. 


12.4.5. Ina study of the net conversion of progesterone in rat liver, the following samples were 
attained for the net conversion in rats 3 to 4 weeks old: 


Male: 16.9 | 16.0 | 13.5 | 13.1 | 14.2 | 11.6 | 12.8 | 17.3 | 13.8 | 9.8 | 16.0 | 15.9 | 16.7 | 15.1 
Female: | 13.8 | 11.2] 7.5 | 10.4} 15.8) 14.5] 9.5 | 9.8 | 5.1 15.5] 65 | 7.2 


Use the large sample Wilcoxon rank sum test at the significance level of 0.05 to test the 
hypothesis that the median net conversion of progesterone in male rats is larger than that 
in female rats. What would be your conclusion if you were to use the median test? 


12.4.6. Two groups of randomly selected 1-acre plots were treated with two different brands of 
fertilizer. The following data give the yields of corn (in bushels) from each of these plots. 


Fertilizer I: | 89 | 93 | 105 | 94 | 92 | 96 | 93 | 101 
Fertilizer II: | 85 | 88 | 94 | 87 | 86 | 91 


Use the data to determine whether there is a difference in yields for two brands of fertilizers. 
Use a = 0.01. State any assumptions you made to solve the problem. 


12.4.7. The following information is obtained from two independent samples. 


Sample 1:|/ 15] 8 | 12] 4 | 10} 8/13) 7 )12|)6)14) 11 
Sample 2: | 18 | 13 | 15 | 19 | 17 | 13 | 17 | 16 


Test at 1% significance level that the median for sample 1 is less than the median for sample 
2 and interpret the meaning of your result. 


12.5 NONPARAMETRIC HYPOTHESIS TESTS FOR k > 2 SAMPLES 


In this section we learn how to compare the medians of more than two independent samples and 
to determine whether medians of the groups differ. These tests are nonparametric alternatives to the 
ANOVA methods discussed in Chapter 10. We study the Kruskal-Wallis test and Friedman test. Both of 
these methods test the equality of the treatment medians. 


12.5 Nonparametric Hypothesis Tests fork > 2 Samples 631 


12.5.1 The Kruskal-Wallis Test 


The Kruskal-Wallis test is a generalization of the Wilcoxon rank sum test for two independent samples 
to several independent samples. This test is a nonparametric alternative to one-way ANOVA. The 
Kruskal-Wallis test is almost as powerful as the one-way ANOVA when the data are from a normal 
distribution, and more powerful in case of nonnormality or in the presence of outliers. We now 
describe this test. 


Suppose that we have k populations, with 6; being the median of the population i and k independent 
random samples from these populations. Let the samples from the ith population be n;. We wish to 
test the equality of the medians of different groups—that is, to test the hypothesis 


Ho :0, =0. =-:-=6,=0 versus Hg: Not all 6’s equal 0. 
We shall show that the hypothesis 0; = --- = 6, is equivalent to the hypothesis Ho : 0; = 62 =--- = 
O = 0. Let 6; = --- = 6 =t (same number). Then the observations y;; — ¢ (i = 1, 2,..., k) will be 


from a population with median zero. Because the Kruskal-Wallis test procedure depends only on the 
ranks of y;; values in the combined sample and the ranks of (y;; — ft) values are identical to those of 
yij Values, the two hypotheses are equivalent. 


We summarize the Kruskal-Wallis procedure to solve this type of problem in the following steps. 


KRUSKAL-WALLIS TEST PROCEDURE 
1. Combine and rank all N = = nj; observations yj; in ascending order. Also keep track of the groups 
from which the Muon Assign average ranks in case of ties. Let 
rj = rank(yj). 


2. Calculate the group sum, 


and the group averages 


3. Let 
3 N(N + 1) 
= - Vj = x 


(this can be used as a check for accuracy of your calculation of 5) and let 


ee ee 
Nie BI 
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4. Calculate the Kruskal—Wallis test statistic 


~ aR Nie Sonn 


or the convenient computational form of H, 


- wale ey 


Note that to compute the convenient form of H, there is no need to calculate 7; andr. 
5. Reject Ho if 


H>¢, 


where the constant c is chosen to achieve a specified value for a. 


The exact distribution of H is complicated. It depends on the sample sizes, nj, n2,...,nx, and so it 
is not practical to tabulate its values beyond a small number of cases. When k or N is large, the exact 
distribution of H under the null hypothesis can be approximated by the chi-square distribution with 
(k — 1) degrees of freedom. To this effect, we state the Kruskal-Wallis theorem without proof. 


Theorem 12.5.1 When Ho : 6; = 02 = --- = O is true, then as N becomes large, the statistic 


a = 
a ee (7 F)2 
~ Nove r) 


has an asymptotic distribution that is chi-square with (k — 1) degrees of freedom. 


Thus, for approximate large samples the Kruskal-Wallis test for a given @ is to reject Ho if 
H 2: 
> xXG(k — 1). 


The chi-square approximation is acceptable when the group sample sizes n; > 5 with k > 3. However, 
for convenience, we will use the chi-square approximation for all values of n;. For this test, we follow 


the procedure described earlier except that for finding the rejection region, we use the chi-square 
table. 


The following example illustrates how we use the foregoing procedure to test the appropriate 
hypothesis for three populations. 


OO 
Example 12.5.1 


In an effort to investigate the premium charged by insurance companies for auto insurance, an agency 
randomly selects a few drivers who are insured with three different companies. Assume that these persons 
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Table 12.10 
Company! Companyll Company Ill 
396 348 378 
438 360 330 
336 522 294 
318 474 
432 


have similar autos, driving records, and level of coverage. Table 12.10 gives the premiums paid per 6 months 
by these drivers with these three companies. Using the 5% level of significance, test the null hypothesis that 
the median auto insurance premium paid per 6 months by all drivers insured with each of these companies 
is the same. 


Solution 


Here we need to test 


Ho : M, = My =M3=0 versus Ha: Not all M/s equal 0, 


where M, is the true median of the auto insurance premium paid to company i, i = 1, 2, 3. 


3 

Here ny = 4, n2 = 3, and n3 = 5. Hence, there are N = )~ nj = 12 observations. Let Y denote the 
i=1 

observations in ascending order. Table 12.11 gives the combined data in ascending order while keeping track 


of the groups and their ranks. 


Table 12.11 


Premium 294 318 330 336 348 360 378 396 432 438 474 522 


Group 3 1 3 1 2 2 3 1 3 1 3 2 


Rank 1 2 3 4 5 6 7 8 9 10. «+11 12 


Thus, the group rank sums are 
ry = 24, r2 = 23, and r3 = 31. 


As a check for accuracy of these calculations, note that 


NWN+1) _ (12)(13) 
2 ~ 2° 


ntra+7r3 =78= 
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The test statistic is given by 


™N 


i oe 
H= 3(N+1 
WED 22m OND 


12 «(24 (23)? | GD? ane 
~ (12)(13) \ 4 3 5 


= 0.42564. 


From the chi-square table, Ne ie (2) = 5.991, and hence the rejection region is H > 5.991. Because the 
observed value of H does not fall in the rejection region, we do not reject Ho and conclude that there is no 
evidence to show that the median auto insurance premiums paid per 6 months by all drivers insured in each 


of these companies are different. 
= 


12.5.2 The Friedman Test 

The Friedman test, named after the Nobel laureate economist Milton Friedman, tests whether several 
treatment effects (measured as locations) are equal for data in a two-way layout. We will assume that 
there are k different treatment levels and / blocks. In each block, assign one experimental unit to each 
treatment level. We want to test whether the true medians for different treatment levels are the same 
in each block—that is, to test 


Ho : True medians at different levels are all equal 
versus 


Hy, : Not all the medians are equal. 


Rather than combine the entire sample as in the Kruskal-Wallis statistic, here we order the 
y-values within each block and then assign each its rank. In order to eliminate the differences due 
to blocks, we take the sum of ranks for each treatment level. The following gives a summary of the 
procedure. 


THE FRIEDMAN TEST PROCEDURE 
1. Rank observations from k treatments separately within each block. Assign average ranks in case of 
ties. Let Rij = rank(Yjj), the rank of the observation for treatment level i in block j. 
2. Calculate the rank sums 


| 
Rive Riel =i dnee ek 
ji 
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3. Calculate the Friedman statistic 


k 
= 
oo 


= 


or a convenient computational form, 


1 
= NS? = ie ED 
kik) pan ee 


4. Reject Ho if S > c, where the constant c is chosen to achieve a specified value for a. 


The exact distribution of S is complicated. Here, for k = 3,4,5, and for various values of /, the 
Friedman distribution has been calculated and its values are given in the table in the Appendix A7. 
We will illustrate this four-step procedure with an example. 


—ll!!:°0b6}????”  _—_ _—_————__._. 
Example 12.5.2 
Three classes in elementary statistics are taught by three different persons, a regular faculty member, a 
graduate teaching assistant, and an adjunct from outside the university. At the end of the semester, each 
student is given a standardized test. Five students are randomly picked from each of these classes, and their 
scores are given in Table 12.12. Test whether there is a difference between the scores for the three persons 
teaching with w = 0.05. 


Table 12.12 
Faculty Teaching assistant Adjunct 
93 88 86 
61 90 56 
87 76 73 
75 82 90 
92 58 47 


Solution 
Here we need to test 


Ho : Median for the three persons scores are all equal 


Hq : The medians are not equal 


We are given a= 0.05, k= 3, and1=5. To compute the value of the statistic S, we first assign ranks for 
each student as shown in Table 12.13. Hq: Note that they are not all equal. 
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Table 12.13 
Faculty Teaching assistant Adjunct 
3 2 1 
2 3 1 
3 2 1 
1 2 3 
3 2 1 


Thus, we have 
R, =12, Ro = 11, and R3 =7, 


and the test statistic is given by 


12 
a eaie ; ae — 31(k + 1) 
12 2 2 2 
Sas (12) + (11)? + (7) )- (3)(5)(4) = 2.8. 


From the Friedman table, the rejection region is S>5.20 at an exact significance level of 0.092. Because the 
computed value of the test statistic does not fall in the rejection region, we do not reject Hp and conclude 
that there is no difference in scores based on who teaches the course. 

| 


When the number of blocks, /, becomes large, the Friedman test statistic has an approximate chi-square 
distribution under the null hypothesis. That is: 


Theorem 12.5.2 When Hp : 0; = 02 = --+ = 03 is true then, as | becomes large, 


k 2 
.. a2 Uk +1) 
~ [k(k +1) dX (x 2 ) 


has an asymptotic distribution that is chi-squared with (k — 1) degrees of freedom. 


Thus, for an approximate large random sample, the Friedman test for given q@ is to reject Ho if S > 
2 
Xg(k — 1). 


When the values of k and/ exceed the values given in the Friedman table, we could use the chi-square 
approximation, which gives acceptable results. We proceed to illustrate the Friedman test with the 
following example. 
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$$ 


Example 12.5.3 
In the previous example, we now randomly select 10 student grades from each class, resulting in the data 
shown in Table 12.14. 


Table 12.14 

Faculty Teaching assistant Adjunct 
93 88 86 
61 90 56 
87 76 73 
75 82 90 
92 58 47 
45 74 88 
99 23 77 
86 61 18 
82 60 66 
74 77 55 


Test whether there is a difference between the scores for the three persons teaching. Use a = 0.05. 


Solution 
Here we need to test 


Ho : The true median scores for the three instructors are all equal 
versus 
Hq : They are not all equal. 


We are given a = 0.05, k = 3, andl = 10. We use the chi-square approximation to solve the problem. To 
compute the value of the statistic S we first assign ranks for each student as shown in Table 12.15. 
The Friedman test statistic is 


k 


12 
R? — 31(k + 1) 
=1 


Ss = —— 
Ik(k + 1) 4 


l 


= _ 1 2 2 2\ _ _ 
~ G0)(3)(4) ((24)? + (20)? + (16)?) = (8) (10)(4) = 3.2. 
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Table 12.15 
Faculty Teaching assistant Adjunct 

3 2 1 
2 3 1 
3 2 1 
1 2 3 
3 2 1 
1 2 3 
3 1 2 
3 2 1 
3 1 2 
2 3 1 

Total 24 20 16 


From the chi-square table, xh 06 (2)= 5.992. Hence, the rejection region is S => 5.992. The computed value 
of the test statistic does not fall in the rejection region, and we do not reject Ho. We conclude that there is 
no difference in scores based on who teaches the course. 

| 


Friedman’s test is an alternative to the repeated measures ANOVA, when assumptions such as that of 
normality or equality of variance are not satisfied. Because this test, like many other nonparametric 
tests, does not make a distribution assumption, it is not as powerful as the ANOVA. 


EXERCISES 12.5 


12.5.1. Table 12.16 shows a random sample of observations on children under 10 years of age, each 
observation being the IgA immunoglobulin level measured in international units from a 
large number of blood samples, and the population is studied in blocks in terms of age 
groups (the upper value is not included) as I: (1 to 3), II: (3 to 6), IIL: (6 to 8), and IV: 
(8 to 10). Test for the hypothesis of equality of true medians for IgA level in each block (age 
level), (a) with the 5% level and (b) with the 1% level of significance. Compare the results 
obtained. 


12.5.2. In an effort to study the effect of four different preventive maintenance programs on down- 
times (in minutes) for a certain period of time in a production line, a factory runs four 
parallel production lines, and each line has five different types of machine. The different 
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Table 12.16 


| 6 37 19 14 51 68 27 75 


Il 32 65 76 42 45 = 41 38) «63 


Il 73 75 59 90 37 32 £63 + 80 


VV 81 42 48 60 98 100 79 45 


maintenance programs are randomly assigned to each of the four production lines so as to 
treat the various machines as blocks. Results are shown in Table 12.17. 


Table 12.17 
Machine Method 1 Method2 Method3 #£Method 4 
| 181 124 126 181 
II 185 122 125 160 
lll 67 65 68 69 
IV 121 66 120 68 
Vv 62 60 62 65 


Test the hypothesis, Ho : True medians of the four maintenance programs are equal versus 
Hi: Not all are equal. [Hint: In the Friedman test, k = 4, and/ = 6.] State any assumptions 
you have made to solve this problem. 


12.5.3. Show that, when k = 2, the Kruskal-Wallis statistics, 
k 2 


12 ri 
H= moray - — 3(N +1) 


n 
i=1 ! 
becomes equivalent to the Wilcoxon rank sum test. 


12.5.4. A consumer testing agency is interested in determining whether there is a difference in the 
mileage for three brands of gasoline. To test this, four different vehicles are driven with each 
of these gasolines. Results are shown in Table 12.18. 


Test whether there is a difference between the three gasoline medians at the 0.05 level. 


12.5.5. In order to study the effect of fertilizers, five groups of 1-acre plots were randomly selected. 
One group was not treated with any fertilizers and the remaining four groups were treated 
with four different brands of fertilizers. Table 12.19 gives the yields of corn (in bushels) 
from each of these plots. 
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Table 12.18 
Gasoline 
Vehicle A B Cc 
I 19 25 22 
ll 26 33 39 
lll 20 28 25 
IV 18 30 21 
Table 12.19 
None: 58 27 36 41 48 36 50 50 39 
Fertilizer |: 69 67 57 63 49 65 78 69 
Fertilizer Il: 95 92 92 89 100 88 79 97 75 
Fertilizer Ill: 102 111 92 103 102 94 100 112 96 
Fertilizer IV: 127 115 112 122 114 107 116 112 108 


Use the data to determine whether there is a difference in yields for different fertilizers. Use 
a = 0.01. 


12.5.6. In order to compare grocery prices of four different grocery stores on a particular day in 
November 1999, 11 randomly selected items with same brands are given in Table 12.20. 
Use the data to determine whether there is a difference in prices at these four grocery store 
chains. Use a = 0.01. State any assumptions you have made to solve this problem. 


12.6 CHAPTER SUMMARY 


In this chapter, we first learned about nonparametric approaches to interval estimation and non- 
parametric hypothesis tests for one sample, such as the sign test, the Wilcoxon signed rank test, 
and dependent sample paired comparison tests. Then nonparametric hypothesis tests for two inde- 
pendent samples such as the median test and Wilcoxon rank sum test were considered. Later the 
Kruskal-Wallis test and the Friedman test were explained for more than two samples. 


It is natural to ask, “Why do we substitute a set of nonnormal numbers, such as ranks, for the original 
data?” Few data are truly normal. Rank tests are some times called “approximate” tests. They are most 
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Table 12.20 

Product StoreA StoreB StoreC StoreD 
Bread (20 oz) $1.39 $1.39 $1.39 $1.39 
Red apple (1 Ib) 1.29 1.29 0.99 0.68 
Large eggs (1 dozen) 0.69 0.88 0.89 0.89 
Orange Juice (64 oz) 3.29 2.99 2.79 2.69 
Cereal (15 oz) 3.59 3.19 3.19 3.58 
Canned corn (15.25 oz) 0.50 0.53 0.50 0.49 
Crystals sugar (5 Ib) 1.99 2.09 1.99 1.89 
2% milk (1 gal) 3.19 3.19 3.09 3.09 
Frozen pizza (21.5 oz) 3.00 4.59 3.50 3.50 
Puppy Chow (4.4 Ib) 4.59 3.69 3.69 3.99 
Diapers (56-pack) 12.99 12.99 12.99 11.88 


useful in instances when we suspect that the data are not normal, and we either cannot transform 
the data to make them more normal, or do not like to do so. One of the simple ways to check for 
appropriateness of use of nonparametric tests is to simply construct a stem-and-leaf display or a 
histogram for the sample data and see whether they look symmetric and approximately bell shaped. 
If this is not so, we may often be better off using a nonparametric approach. 


Since the 1940s, many nonparametric procedures have been introduced, and the number of proce- 
dures continues to grow. The nonparametric tests presented in this chapter represent only a small 
portion of available nonparametric tests. There are many references available in the bibliography for 
further reading on the subject. 


In this chapter, we have also learned the following important concepts and procedures. 


= Procedure for finding (1 — w)100% confidence interval for the median M 
Hypothesis testing procedure by sign test 

A large sample sign test 

Hypothesis testing procedure by Wilcoxon signed rank test 

Summary of large sample Wilcoxon signed rank test (n > 20) 

Summary of large sample median sum test (n; > 5 and nz > 5) 
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Hypothesis testing procedure by Wilcoxon rank sum test 

Summary of large sample Wilcoxon rank sum test (1, > 10 and n2 > 10) 
Kruskal-Wallis test procedure 

Friedman test procedure 


12.7 COMPUTER EXAMPLES 


In this section, we illustrate some nonparametric procedures using statistical software packages. 


12.7.1 Minitab Examples 


a 


Example 12.7.1 
(One-sample sign): For the data 


1.51 1.35 1.69 148 1.29 1.27 1.54 1.39 1.45 


test Hp : M=1.4 versus Hg : M > 1.4, using sign test. 


Solution 
Enter data in C1. Then 


Stat > Nonparametric > 1-Sample Sign... > in Variables: type C1 > click Test median: type 1.4 
> in Alternative: click greater than > click OK 


We will get the following output. 


Sign Test for Median 


Sign test of median = 1.400 versus > 1.400 


N | Below | Equal | Above P Median 
C1} 9 4 0 5 0.5000 | 1.450 


Looking at the p-value of 0.500, we will not be able to reject the null hypothesis for any reasonable values 
of a. 
= 


We can obtain the nonparametric confidence interval using the following procedure. Enter in variable, 
C1, and then 


Stat > Nonparametric > 1-Sample Sign... > in Variables: type C1 > click Confidence interval > in 
level: enter appropriate, say, 95.0 > Click OK 
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Example 12.7.2 
(One-sample Wilcoxon): For the data 


1.51 1.35 169 148 1.29 1.27 1.54 1.39 1.45 
test Ho : M = 1.4 versus Hg : M 4 1.4, using one-sample Wilcoxon test. 


Solution 
We will give only Sessions commands; the Windows procedure is similar to the previous example. 


Stat > Nonparametric > 1-Sample Wilcoxon... > in Variables: type C1 > click Test median: type 
1.4 > in Alternative: click not equal > click OK 


We will get the following output. 


Wilcoxon Signed Rank Test 
Test of median = 1.400 versus median not = 1.400 


N | Nfor | Wilcoxon P Estimated 
Test | Statistic Median 
c1|9 9 29.0 0.477 1.435 


Looking at the p-value of 0.447, we will not be able to reject the null hypothesis for any reasonable values 
of a. 
ike 


wwe 
Example 12.7.3 
(Two-sample sign test): For the data 


Sample 1 | 180 | 199 | 175 | 226 | 189 | 205 | 169 | 211 
Sample 2 | 172 | 191 | 172 | 230 | 178 | 199 | 171 | 201 


test Hp : M = O versus Hg : M < 0, using the two-sample sign test, where M is the median of the 
difference. Use w = 0.05. 


Solution 
After entering sample 1 data in €1 and sample 2 data in C2, we can use the following sequence: 


Calc > Calculator... > in Store result in variable: type C3 > in Expression: type C2-C3 > click OK 


We will get the pairwise difference of the two samples. For these values, we will apply the one-sample sign 
test. 


Stat > Nonparametric > 1-sample sign... > in Variables: type C3 > click Test median : and in 
Alternative: choose less than > click OK 
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We will get the following output: 


Sign Test for Median 

Sign test of median = 0.00000 versus < 0.00000 

N | Below | Equal | Above P Median 
C3 |] 8 6 0 2 0.1445 | —7.000 


Because the p-value is 0.1445, which is greater than 0.05, we will not reject the null hypothesis. 
[rl 


.00.00.0.0.??:?:°. 0°... SS... .Tl °_0°¢8. 
Example 12.7.4 
(Kruskal-Wallis test): In an effort to investigate the premium charged by insurance companies for auto 
insurance, an agency randomly selects a few drivers who are insured by three different companies. Assume 
that these persons have similar cars, driving records, and levels of coverage. Table 12.21 gives the premiums 
paid per 6 months by these drivers with these three companies. 
Using the 5% significance level, test the null hypothesis that the median auto insurance premium paid per 
6 months by all drivers insured in each of these companies is the same. Use Minitab. 


Solution 
Enter data for company | in C1, for company Il in C2, and for company Ill in C3. First stack the data while 
keeping track of the companies in the following way. 


Manip > Stack/Unstack > Stack Columns... > in Stack the following columns: type C1 C2 C3 > 
in Stored data in: type C4 > in Store subscripts in: type C5 > Click OK 


Now we can use Kruskal—Wallis as follows. 


Stat > Nonparametric > Kruskall-Wallis. .. > in Response: type C4 > in Factor: type C5 > click 
OK 


We will get the output shown in Table 12.22. 
Because the p-value of 0.808 is larger than a = 0.05, we cannot reject the null hypothesis. 


Table 12.21 
Company! Company ll Company Ill 
396 348 378 
438 360 330 
336 522 294 
318 474 


432 
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Table 12.22 Kruskal-Wallis Test 
Kruskal-Wallis Test on C4 


C5 N Median Ave rank Z 

1 4 366.0 6.0 —0.34 
2 3 360.0 7.7 0.65 
3 5 378.0 6.2 —0.24 
Overall 12 6.5 


H = 0.43 DF = 2 P= 0.808 


* NOTE * One or more small samples 


—oaaaaooooooooooooooooooooo————————————— 
Example 12.7.5 
(Friedman test): For the following data, conduct a Friedman test. 
93 61 87 75 92 45 99 86 82 74 
88 90 76 82 58 74 23 61 60 77 
86 56 73 90 47 88 77 18 66 55 
Solution 


Enter each row of data in C1, C2, and C3 respectively. Then stack the data in C1, C2, C3 in the 
following way. 


Manip > Stack/Unstack > Stack Columns... > in Stack the following columns: type C1 C2 C3 > 
in Stored data in: type C4 > in Store subscripts in: type C5 > Click OK 


In C6, enter numbers 1 through 10 in the first 10 rows, enter numbers 1 through 10 in the next 10 rows, and 
enter numbers 1 through 10 in the following 10 rows. Now we can use the Friedman test as follows. 


Stat > Nonparametric > Friedman... > in Response: type C4 > in Treatment: C5 > in Blocks: 
type C6 > click OK 


We will get the output shown in Table 12.23. 
Because the p-value is 0.202, for any value of a < 0.202, we cannot reject the null hypothesis. 
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Table 12.23 Friedman Test for C4 by C5 Blocked by C6 
S = 3.20 DF =2P=0.202 


C5 N Est median Sum of ranks 
1 10 81.500 24.0 
2 10 72.000 20.0 
3 10 68.000 16.0 


Grand median = 73.833 


12.7.2 SPSS Examples 


=eE—_e_—s—«—«s—«s—<@9@[[[[[[[[[[[[PPPPPPPPyryrPrreeeeE=EeEeEeEeEeaRaaeEeaEeEe=EeEeEeEeEeEeeeeee—= 
Example 12.7.6 
(Wilcoxon rank sum test): For the data of Example 12.4.2, use the Wilcoxon rank sum test at the sign- 
ificance level of 0.05 to test the null hypothesis that the two population medians are the same against the 
alternative hypothesis that the population medians are different. Use an SPSS procedure. 


Solution 

Because the SPSS pull-down menu does not have the Wilcoxon rank sum test, we will use the Mann—Whitney 
U-test. The Mann—Whitney U-test is equivalent to the Wilcoxon rank sum test, although we calculate it in a 
slightly different way. For the same data set, any p-values generated from one test will be identical to those 
generated from the other. The following gives the steps to follow. Enter tire brands as 1 to identify brand 1 
and 2 to identify brand 2, in C1. Enter the corresponding prices in C2. Name C1 as Brand and C2 as Price. 
Then click 


Analyze > Non-parametric Tests > 2 Independent Samples... > move Brand to Grouping 
Variable: and Price to Test Variable list: > click Define Groups... > enter 17 in Group 1:, and 2 in 
Group 2: > click continue > choose Mann-Whitney U > OK 


We get the following output: 


Mann-Whitney Test 
Ranks 


BRAND | N | Mean Rank | Sum of Ranks 
PRICE 1.00 6 8.17 49.00 
2.00 8 7.00 56.00 
Total 14 
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Test Statistics 


PRICE 
Mann-Whitney U 20.000 
Wilcoxon W 56.000 
Z —.518 
Asymp. Sig. (2-tailed) 605 
Exact Sig. [2*(1-tailed Sig.)] 662 


(a) Not corrected for ties. 

(b) Grouping Variable: BRAND 
In the first table just shown, ranks show the mean ranking of tire brand | and tire brand II. The Mann—Whitney 
test is used to assess whether the distribution of ranks is statistically significant. Under the null hypothesis, 
the distribution of ranks should be the same for both groups. Looking at the second table, the calculated 
value of the Mann-Whitney U is 20. The value U represents the amount by which the ranks for tire brand | 
and tire brand II deviate from what we would expect under the null hypothesis. For a 0.05 significance level, 
we can reject the null hypothesis if the 2-tailed significance (see Asymp. sig in the second table) is less than 
0.05. In this case, because Asymp. Sig. (2-tailed)=0.605, we do reject the null hypothesis. 

= 


-_—_—_.:::: kxKTC@FvhVvKhK©¥»¥V¥»vKvw¥h_—_ —OoOoOoOXx—_—_—_ CC Coe 
Example 12.7.7 
(Kruskal-Wallis test): For the data of Example 12.5.1, conduct the Kruskal—wallis test using SPSS. 


Solution 
Enter insurance companies as 1 to identify company I, 2 to identify company Il, and 3 to identify company 
Ill, in C1, Enter the corresponding premiums in C2. Name C1 as Company, and C2 as Premium. Then: 


Analyze > Nonparametric Tests > K Independent Samples... > move Premium to Test Variable 
List: and Company to Grouping variable: > click Define Rage... > enter 1 in Minimum, and 3 in 
Maximum > click continue > click Kruskal-Wallis H > OK 


We get the following output. 
Kruskal—Wallis Test 


Ranks 
COMPANY N Mean Rank 
PREMIUM 1.00 4 6.00 
2.00 5) 7.67 
3.00 5 6.20 


Total 12 
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Test Statistics 


PREMIUM 
Chi-Square 426 
df 2 
Asymp. Sig. 808 


(a) Kruskal—Wallis Test 
(b) Grouping Variable: COMPANY 
Comparing the value in asymptotic significance of 0.808 with a = 0.05, we will not reject the null hypothesis. 
| 


If we need to do a Friedman test, say for the data of Example 12.7.5, enter each row of data in C1, 
C2, and C3, respectively. Then use the following sequence to obtain the appropriate output. 


Analyze > Nonparametric Tests > K Related Samples... > move each of the three columns to Test 
Variables: > check in Test Type Friedman > OK 


12.7.3 SAS Examples 

To perform the nonparametric tests, use the SAS statement PROC NPARI1WAY. In the procedure, if 
we include the EXACT statement, the program will compute the exact p-value computations for the 
Wilcoxon rank sum test. 


EO 


Example 12.7.8 
(Wilcoxon rank sum test): Comparison of the prices (in dollars) of two brands of similar tires gave the 
following data. 


Tirel: | 85 | 99 | 100 | 110 | 105 | 87 
Tire ll: | 67 | 69 | 70 | 93 | 105 | 90 | 110 | 115 


Use the Wilcoxon rank sum test at the significance level of 0.05 to test the null hypothesis that the two 
population medians are the same against the alternative hypothesis that the population medians are 
different. Use the SAS procedure. 


Solution 
We can use the following procedure. 


options nodate nonumber; 


DATA tprice; 

INPUT Brand Price @@; 

CARDS 

1 85 1 99 1 OO 1 1l@ i 10s 1 e7 

26) 269 2 70 2 98 2 105 2 90 2 1l@ 2 Lis 
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le Nonparametric statistics/Wilcoxon Rank- 
PROC NPARIWAY DATA=tprice WILCOXON; 

Viper has 

VAR 


PRACT “WILCOXON; 
run; 


We will get the following output. 


Paired Wilcoxon Rank-Sum Test for Mean Comparison 
(Also called as Mann-Whitney Test) 


The NPARIWAY Procedure 


Wilcoxon Scores (Rank Sums) for Variable Price 
Classified by Variable Brand 


Sum of Expected Std Dev Mean 


Brand N Scores Under HO Under HO Score 
1 6 49.0 45.0 7.728924 8.166667 
2 8 56.0 60.0 7.728924 7.000000 


Average scores were used for ties. 


Wilcoxon Two-Sample Test 
Statistic (S) 49.0000 
Normal Approximation 
Z 0.4528 
One-Sided Pr > Z 0.3253 
Two-Sided Pr > |Z| 0.6507 
t Approximation 
One-Sided Pr > Z 0.3291 
Two-Sided Pr > |Z| 0.6581 
Exact Test 
One-Sided Pr >= §$ 0.3200 
Two-Sided Pr >= |S - Mean] 0.6407 

Z includes a continuity correction of 0.5. 


Kruskal-Wallis Test 
Chi-Square 0.2678 
DF 1 
Pr > Chi-Square 0.6048 


The sum of ranks of both brands are given. The exact two-tailed p-value for this test is 0.6407, which is 
greater than a = 0.05. Hence, we will not reject the null hypothesis. 
= 
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GG 
Example 12.7.9 
(Kruskal-Wallis test): For the data of Example 12.7.4, perform the Kruskal-Wallis test using SAS. 


Solution 


We can use the following code. 


options nodate nonumber; 


DATA insprice; 

INPUT Company Price @@; 

GAR DISE 

1 396 1 438 1 336 1 Sie 

2 348 2 360 2 522 

3 378 3 330 3 294 3 AVA 3 432 


proc nparlway data = insprice; 
class company; 
war Prices 

run; 


We will get the following output. 
The SAS System 


The NPARIWAY Procedure 


Analysis of Variance for Variable Price 
Classified by Variable Company 


Company Mean 
1 4 372.00 
2 5 410.00 
3 5 381.60 


Among 2 2605.80 1302.900000 0.2371 0.7937 
Within 9 49459.20 5495.466667 


The SAS System 
The NPARIWAY Procedure 


Wilcoxon Scores (Rank Sums) for Variable Price 
Classified by Variable Company 


Sum of Expected Std Dev Mean 
Company N Scores Under HO Under HO Score 


1 4 24.0 26.00 5.887841 6.000000 
2 3 23.0 19.50 9.408327 7.666667 
3 5 31.0 32.50 6.157651 6.200000 
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Kruskal-Wallis Test 
Chi -Square 0.4256 
DF 


Pr > Chi-Square 0.8083 


The SAS System 
The NPARIWAY Procedure 


Median Scores (Number of Points Above Median) for 
Variable Price 


Classified by Variable Company 


Sum of Expected Std Dev Mean 
Company N Scores Under HO Under HO Score 
1 4 2.0 2.00 0.852803 0.500000 
2 3 1.0 1.50 0.783349 0.333333 
3 5 3.0 2.50 0.891883 0.600000 


Median One-Way Analysis 
Chi-Square 0.4889 
DF 2 

Pr > Chi-Square 0.7831 


The SAS System 
The NPARIWAY Procedure 


Van der Waerden Scores (Normal) for Variable Price 
Classified by Variable Company 


Sum of Expected Std Dev Mean 
Company N Scores Under HO Under HO Score 
1 4 -0.492781 0.0 1.386378 -0.123195 2 3 


1.036137 0.0 1.273470 0.345379 3 5 -0.543356 0.0 
1.449909 -0.108671 


Van der Waerden One-Way Analysis 
Chi-Square 0.6626 

DF 2 

Pr > Chi-Square 0.7180 


The SAS System 
The NPARIWAY Procedure 


Savage Scores (Exponential) for Variable Price 
Classified by Variable Company 
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Sum of Expected Std Dev Mean 
Company N Scores Under HO Under HO Score 
1 4 -0.817316 0.0 1.468604 -.204329 2 3 
1.266775 0.0 1.348999 0.422258 3 5 -0.449459 0.0 
-535903  -0.089892 


Savage One-Way Analysis 
Chi -Square . 9178 


DF 
Pr > Chi-Square 06320 


The SAS System 
The NPARIWAY Procedure 
Kolmogorov-Smirnov Test for Variable Price 
Classified by Variable Company 


EDF at Deviation from Mean 

Company N aximum at Maximum 
1 4 0.500000 0.333333 
2 3 0.000000 -0.577350 
3 5 0.400000 0.149071 
Total 12 0.333333 

Maximum Deviation Occurred at Observation 3 

Value of Price at Maximum = 336.0 


Kolmogorov-Smirnov Statistics (Asymptotic) 
KS 0.197203 KSa 0.683130 
Cramer-von Mises Test for Variable Price 
Classified by Variable Company 


Summed Deviation 


Company from Mean 
1 4 0.032407 
Z 3 0.086806 
3 5 0.028009 


Cramer-von Mises Statistics ees 
CM 0.012269 CMa 0.14722 


Looking at the p-value (0.8083) in the ee Wallis test, we cannot reject the null hypothesis. 


PROJECTS FOR CHAPTER 12 


12A. Comparison of Wilcoxon Tests with Normal Approximation 


(i) For the Wilcoxon signed rank test, compare the results from the Wilcoxon signed rank test 
table with the normal approximation using several sets of data of various sample sizes. 
Also, if the sample size is very small, compare the results from the Wilcoxon signed rank 
test with a small sample f-test. 

(ii) For the Wilcoxon rank sum test, compare the results from the Wilcoxon rank sum test 
table with the normal approximation using several sets of data (from pairs of samples) of 
various sample sizes. Also, if the sample sizes are very small, compare the results from the 
Wilcoxon rank sum test with small sample t-test for two samples. 
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12B. Randomness Test (Wald-—Wolfowitz Test) 


When we have no control over the way in which the data are selected, it is useful to have a technique 
for testing whether the sample may be looked on as random. The condition of randomness is essen- 
tial for all of the analysis explained in this book: that is, whether a sequence of random variables 
X1,..., Xn are independent based on a set of observations x1,..., x, of these random variables. 
Here we will give a method based on the number of runs displayed in the sample events. This is a 
nonparametric procedure. The run test is used to test the randomness of a sample at 100(1 — a)% 
confidence level. 


Given a sequence of two symbols, say H and T, a run is defined as a succession of identical symbols 
contained between different symbols or none at all. The total number of runs in a sequence of n 
trials serves as an indication whether the arrangement is random or not. If a sequence contains 
symbols of one kind and nz symbols of another kind and both n; and nz are greater than 10 (this 
is a rule of thumb; for more accuracy we can also take both n; and n2 as greater than 20), then 
the sampling distribution of the total number of runs, R, has an asymptotic normal distribution 
with mean 


2nyn2 
ny +ng2 


LR= 


and variance 


2njnz (2njnz — ny — N72) 
(ny +g)? (ny +n2—- 1) 


2 
OR= 


For example, if we have the following symbols 


HHH T HHTTTT AA TTT 


there are six runs indicated by the underlines and n| =7 and n2=8. If the sample contains numerical 
data, the run test is used by counting runs above and below the median. Denoting the observations 
above the median by the letter A and observations below the median by the letter B, we can determine 
the run as before. For example, if we have data values 


2 5 11 13 7 22 6 8 15 9 


then the median is 8.5. Hence, we get the following arrangement of values above and below the 
median: 


BB AA BA BB AA. 


Hence, there are six runs with nj =5 andnz=5. 


Now we can formulate the test of randomness as a hypothesis testing problem as described in the 
following procedure. 
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PROCEDURE FOR TEST OF RANDOMNESS USING RUN TEST 
To test 


Ho : Arrangement of sample values is random 


versus 


Hq: Data are not random. 


1. Compute the median of the sample. 
2. Going through the sample values, replace any observation with A if the value is above the median, 
or B if the value is below the median. Discard any ties. 
3. Compute n1, 2, and R. Also, compute the mean and variance of R. 
2n1n2 
LR = canal aP | 
2njn2(2n;nz — ny — nz) 
(ny +.1g)*(my +102 — 1)” 


a 
OR = 


4. Compute the test statistic: 


5. Rejection region: 
IZ| > Zq/2- 


6. Decision: If the test statistic falls in the rejection region, reject Ho and conclude that the sample is 
not random with (1 — a) 100% confidence. 


Assumption: n; > 10 and n> > 10. 


Note: Sometimes the same procedure is used with the median replaced by the mean of the sample. 
That is, if the observation is above the sample, use A, and if it is below the sample, use B. We use this 
procedure for large samples. For small sample sizes, to determine the upper and lower critical values, 
a special table is needed. Some statistical software packages have the ability to use the run test for 
randomness. For example, in Minitab we can use following procedure. 


Enter the data that we want to test for randomness in C1. Then: 


Stat > Nonparametric > Runs Test... > |n variables: enter C1 > OK 


Default in Minitab is a run test with the mean. If we prefer median, type the value of the median by 
first clicking Above and below:. 


Projects for Chapter 12 655 


erC_C_r— SSS SS oeEESo 
Example 12.B.1 
The following table gives radon concentration in pCi/L obtained from 40 houses in a certain area. 
29 06 135 17.11 28 3.8 160 2.11 64 17.2 
79 O5 13.7 115 29 36 61 88 2.2 94 
159 88 98 115 123 3.7 89 13.00 7.9 11.7 
6.2 69 128 13.7 27 35 83 15.9 5.1 60 


Test using Minitab (or some other software) whether the data are random at 95% confidence level. 


Solution 
Running the data with Minitab, we get the following output. 


radon 
K = 8.3400 
The observed number of runs = 17 
The expected number of runs = 20.9500 
19 Observations above K 21 below 
The test is significant at 0.2046 
Cannot reject at alpha = 0.05 


Thus the data set is a random sample at 95% confidence level. 


EXERCISE 


Pick a couple of data sets from this book or your own and test for randomness using (i) hand 
calculations, and (ii) a statistical software package. 
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Empirical Methods 


Objective: In this chapter we introduce several empirical methods that are being increasingly used 
in statistical computations as an alternative or as an improvement to classical statistical methods. 
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(Source: http://scienceworld.wolfram.com/biography/Ulam. html) 


Stanislaw Ulam (1909-1986) was a Polish-American mathematician who was born in Lwow, Poland, 
and came to the United States in 1936. He worked at Princeton University. He was involved with 
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the Manhattan Project to build the first atomic bomb. Ulam solved the problem of how to initiate 
fusion in the hydrogen bomb. Ulam was interested in astronomy, physics, and mathematics from an 
early age. He obtained his Ph.D. from the Polytechnic Institute in Lw6w in 1933, where he studied 
under a famous mathematician named Banach. Ulam’s writing included A Collection of Mathematical 
Problems (1960), Sets, Numbers and Universes (1974), and Adventures of a Mathematician (1976). His 
major contribution to statistics is through the introduction of the Monte Carlo methods along with 
Metropolis in 1949. These methods are widely used in solving mathematical problems using statistical 
sampling. Monte Carlo methods became widely popular with the ever-increasing power of computers 
and the development of specialized mathematical and statistical software. 


13.1 INTRODUCTION 


In statistics, major efforts are made to develop and study accurate statistical models that are able to 
describe natural phenomena. The dilemma is whether to use the standard model that may allow 
closed-form solutions, or to describe the phenomenon more accurately, which would often preclude 
the computation of explicit answers. Obtaining methods that result in useful qualitative and quan- 
titative understanding of realistic complex systems is difficult, and obtaining exact analytical tools 
is not practical either. Because of this problem, practitioners have relied on simulation-based meth- 
ods. Computer simulation methods are becoming tools of choice for problems in statistics. Most of 
the empirical methods discussed in this chapter had been in existence in the statistical literature as 
possible numerical methods for some time. Because of the difficulty of computing by hand, these 
methods did not gain much popularity. These numerical techniques became popular and practical 
with the advent of high-quality pseudo random number generators and high-speed computers. Mod- 
ern statistics is increasingly being equipped with theoretical concepts complemented with effective 
computational tools to handle the challenges that arise in science and technology. The methods pre- 
sented in this chapter could be effectively used for Bayesian computation and for problems arising in 
such diverse areas as environmental modeling, epidemiology, finance, genetics, image analysis, and 
statistical physics. 


It is important to note that the literature on these simulation methods is growing, and it is impossible 
to present the whole picture in a single chapter. The purpose of this chapter is only to introduce some 
basic and popular computational methods. There are many specialized books for further study. 


13.2 THE JACKKNIFE METHOD 


It was Tukey who in 1958 gave the name “jackknife” (sometimes also known as the Quenouille- 
Tukey jackknife) to a general statistical method, invented by Maurice Quenouille in 1956, for testing 
hypotheses and finding confidence intervals where traditional methods are not applicable or not well 
suited. In general usage, a jackknife is a large clasp knife that has a multitude of small pull-out tools. 
Because this method could be used for small tasks without resorting to other tools, it was named the 
jackknife. The jackknife method could also be used with multivariate data. However, here we will only 
present the method for univariate data. The jackknife procedure is very useful when outliers are present 
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in the data or the dispersion of the distribution is wide. In the jackknife method, we systematically 
recompute the statistic, leaving out one observation at a time from the observed sample. This is used 
to estimate the variability of statistic from the variability of that statistic between subsamples. This 
avoids the parametric assumptions that we used in obtaining the sampling distribution of the statistic 
to calculate standard error. Thus, this can be considered as a nonparametric estimate of the parameter. 
Initially, the jackknife method was introduce for bias reduction (thus improving a given estimator) 
and is a useful method for variance estimation. In this section, we study only how to compute a 
jackknife estimate and a confidence interval. We do not discuss how it reduces bias or any other 
theoretical properties. 


Let X1,..., Xn be arandom sample from a population with finite variance. Then the sample mean is 


1 


X= ra a Xj. 
If one of the observations, say, the kth observation, is taken out (or missing), then 


1 n 1 n 
X= (yx/- x] = ye 
i=1 


k#i=1 


Now, if we know the overall sample mean X and we calculated X_;, then we can obtain the deleted 
observation X; by using the formula 


X, =nX —(n— 1)X_,. 


In general, suppose that the population parameter 0 is estimated by a function of the sample values 
6(X1,..., Xn), represented by 6, and let 6_; be the corresponding estimate by removing the kth 
observation. Note that here 6 is any parameter; it need not be the population mean. Then the set of 
“pseudo-values” 6, k=1,2,...,n is obtained by 


6 =no— (n — 1)6*,. 
The average of these pseudo-values 
1 n 
a* — a* 
i wee 
k=1 
is the jackknife estimate of the parameter 0. 


2 . . ke . 
Let s* be the sample variance of these pseudo-values. Then, the variance of 6* is estimated by ° /n, 


and a (1 — a) 100% jackknife confidence interval for 0 is given by 
x 


a Ss 
O* + tyy2 — 
Te 


where fy/2 is evaluated with (n — 1) degrees of freedom. 
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A PROCEDURE FOR JACKKNIFE POINT AND INTERVAL ESTIMATION 


1. Generate a random sample Xj, ...,Xn from a population. 

2. First remove X; from the sample (so the new sample will be Xp, ...,Xn) and compute the estimator 
6_ (such as the sample mean); then remove X3 (the resulting sample will be X1,X3, ...,Xn) and 
compute the estimator 6_, and so on until the last sample is X1, ...,X,—1, with the estimator 
being 6p. 


3. The jackknife point estimate of 6 is 


+ a . : 2, 
4. Calculate the sample variance of the values 6_;,/ = 1, ...,n, and denote the variance by s* . 


5. A(1 — @)100% jackknife confidence interval for 6 is given by 
* 


a S 
oe + tu/2 


Tr 


————OOOOOOOOOO::.°0.0°0 Olenhn=———_— ss eee 
Example 13.2.1 
A random sample of n = 6 from a given population resulted in the following data: 


7.2 5.7 4.9 62 8.5 2.8 
(a) Find a jackknife point estimate of the population mean jw. 
(b) Construct a 95% jackknife confidence interval for the population mean ju. 


Solution 
(a) Here n = 6. Table 13.1 represents the original sample and the six jackknife samples. 


Table 13.1 

Original Sample 1 Sample 2 Sample 3 Sample 4 Sample 5 Sample 6 
7.2 5.7 7.2 7.2 7.2 7.2 7.2 
5.7 4.9 4.9 5.7 4.9 49 49 
4.9 6.2 6.2 6.2 5.7 6.2 6.2 
6.2 8.5 8.5 8.5 8.5 5.7 8.5 
8.5 2.8 2.8 2.8 2.8 2.8 5.7 
2.8 


Using Minitab descriptive statistics, we obtained the summary of the analysis given in Table 13.2. 
Now taking the mean and standard deviation of the means of the six jackknife samples, we get 
j* = 5.883, and the standard deviation s* = 0.392. Thus the jackknife point estimate of ju is 
ju* = 5.883, which is the same as the mean of the original sample. However, we can see that the 
standard deviation resulting from the jackknife is only 0.392, compared to 1.959 for the original 
sample. 
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Table 13.2 

Variable N Mean Median TrMean StDev SE Mean 
Original 6 5.883 5.950 5.883 1.959 0.800 
Sample 1 5 5.620 5.700 5.620 2.068 0.925 
Sample 2 5 5.920 6.200 5.920 2.188 0.978 
Sample 3 5 6.080 6.200 6.080 2.123 0.949 
Sample4 5 5.820 5.700 5.820 2.183 0.976 
Sample 5 5 5.360 5.700 5.360 1.656 0.741 
Sample6 5 6.500 6.200 6.500 1.395 0.624 


(b) A 95% jackknife confidence interval for x is 


s* 0.392 
Ak 
iting 2= Sees osn 
“2 In V6 


resulting in (5.471, 6.2944). Compare this with Example 6.3.1, where we got the confidence interval 
as (3.827, 7.939). Thus, through the jackknife method, we get a much tighter confidence interval 
for |. 

= 


The jackknife method of resampling is also known as the “leave-one-out” method because it uses all 
observations but one in each subsample. Here, every observation is left out exactly once. Note that 
in the jackknife method, sampling is done without replacement. This procedure can also be used for 
other statistical procedures such as hypothesis testing and regression. 


EXERCISES 13.2 


13.2.1. The following data represent the total ozone levels measured in Dobson units at randomly 
selected locations on earth on a particular day. 


269 246 388 354 266 303 
295 259 274 249 271 254 


(a) Find a jackknife point estimate of the population mean p ozone level. 
(b) Construct a 95% jackknife confidence interval for the population mean ju. 
(c) Compare the confidence interval obtained in part (b) with that in Example 6.3.3. 


13.2.2. A drug is suspected of causing an elevated heart rate in a certain group of high-risk patients. 
Twenty patients from those group were given the drug. The changes in heart rates were found 
to be as follows. 


-1 8 5 10 2 12 7 9 1 3 
4 6 4 12 11 2 -1 10 2 8 
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Construct a 98% jackknife confidence interval for the mean change in heart rate. Interpret 
your answer. 


13.2.3. Air pollution in large U.S. cities is monitored to see whether it conforms to requirements set 
by the Environmental Protection Agency. The following data, expressed as an air pollution 
index, give the air quality of a city for 10 randomly selected days. 


57.3 58.1 58.7 66.7 58.6 61.9 59.0 64.4 62.6 64.9 


Construct a 95% jackknife confidence interval for the actual average air pollution index for 
this city and interpret. 


13.2.4. The mileage (in thousands) for a random sample of 10 rental cars from a large rental 
company’s fleet is listed. 
7 13 5 5 ll 15 7 9 13 8 
Find a 95% jackknife confidence interval for the population mean mileage of the rental cars 
of this company. 


13.2.5. The following data represent cholesterol levels (in mg/dL) of 10 randomly selected patients 
from a large hospital on a particular day. 
360 352 294 160 146 142 318 200 142 116 
Determine a 95% jackknife confidence interval for «?. Compare this with the confidence 
interval obtained in Example 6.4.2. 


13.2.6. Air pollution in large U.S. cities is monitored to see whether it conforms to requirements set 
by the Environmental Protection Agency. The following data, expressed as an air pollution 
index, give the air quality of a city for five randomly selected days. 


56.23 57.12 57.7 63.92 59.40 


Construct a 99% jackknife confidence interval for the actual variance of the air pollution 
index for this city and interpret. 


13.2.7. It is known that some brands of peanut butter contains impurities within an acceptable 
level. A test conducted on 12 randomly selected jars of a certain brand of peanut butter 
resulted in the following percentages of impurities: 


19 2.7 2.1 2.8 2.3 36 14 18 2.1 3.2 2.0 1.9 


(a) Construct a 95% jackknife confidence interval for the average percentage of impurities 
in this brand of peanut butter. 

(b) Give an approximate 95% jackknife confidence interval for the population variance. 

(c) Interpret your results. 


13.2.8. The following is a random sample taken from the data that represents the time intervals in 
days between earthquakes that either registered magnitudes greater than 7.5 on the Richter 
scale or produced more than 1000 fatalities during the time period December 1902 to March 


1977. 
263 1901 121 832 150 99 
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(a) Construct a 95% jackknife confidence interval for the average number of days between 
earthquakes of this type. 

(b) Give an approximate 95% jackknife confidence interval for the population variance of 
number of days between earthquakes of this type. 


13.3 AN INTRODUCTION TO BOOTSTRAP METHODS 


In this section, we describe some aspects of a relatively recent statistical technique known as the 
bootstrap method that can be used when the statistical distribution is unknown or the assumptions 
of normality are not satisfied. This is a general method for estimating sampling distributions. The 
concept of the bootstrap was introduced by Bradley Efron in 1979 and further developed by Efron 
and Tibishirani in 1993. We often try to determine the exact (sampling) distribution in an inferential 
procedure, such as the sampling distribution of the sample mean, the median, or the variance, to 
be used in computing confidence intervals and for testing hypotheses. However, as we have seen, 
this is often the most difficult part of the work, because the sampling distribution depends on the 
population distribution, which is often unknown. This is the reason why asymptotic methods are 
quite frequently used for hypothesis testing and interval estimation. The bootstrap procedure pro- 
vides us with a simple method for obtaining an approximate sampling distribution of the statistic, 
conditional on the observed data. However, it should be noted that the distribution thus obtained is 
only approximate. It is not as “good” as the exact distribution, because we have only a sample from 
the population. However, often, a bootstrap sampling distribution is easier to compute. Bootstrap 
methods are computer-intensive methods that use simulation to calculate standard errors, confidence 
intervals, and significance tests. The methods are applied by researchers in business, econometrics, life 
sciences, medical sciences, social sciences, and other areas where statistics is being utilized. The boot- 
strap method uses computer-generated pseudo-random numbers. So the same situations might give 
similar but possibly different results. Also, it is computationally more involved to obtain results than 
by using the asymptotic distribution. The advantage is that the results are conditional on observed 
data, not based on large sample approximations. How does bootstrap help in reality? For instance, 
suppose we have 10 years of monthly return data on a particular stock. If we were to use these data to 
predict the future return, say through linear regression, we would be assuming that the future is going 
to behave similarly to what happened in the past. We know from experience that such an assumption 
may not give us a good prediction and the underlying parametric assumptions may not hold. By 
creating bootstrap samples from these available data, what we are creating is not what happened, but 
rather what could have happened in the past from what did happen. For example, to see how resam- 
pling affects sample mean, a particular mutual fund had the following total return (in percentage) 
for the past five 5 years: 


Year 1 2 3 4 5 
Total return | 40.7 | 10.8 | 29.2 | 9.9 | 0.7 


In this case, the average return for the past 5 years is 18.26%. A two-times resampling (what could 
have happened) resulted in the following outcomes. 


Year 1 2 i) 4 5 
Total return | 29.2 | 40.7 | 9.9 | 10.8 | 10.8 
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Here, the average is 20.28%. The next one gave the following: 


Year 1 2 3 4 5 
Total return | 0.7 | 0.7 | 40.7 | 0.7 | 9.9 


The resulting average return is 10.54%. A realistic future prediction method should depend on these 
possible fluctuations that could have happened in different scenarios. 


Most of the inferential procedures we learned are based ona single sample drawn from the population. 
Bootstrap methods, in contrast, generate repeated subsamples from the single original sample itself 
and make inferences without assuming any particular functional form for the population distribution. 
Because this has the effect of sampling with replacement, we can create as many subsamples as we 
wish. These subsamples will have the same sample size and values as the original sample, except that 
many values in each of the subsamples will be repeated because of sampling with replacement. It 
should be noted that the effectiveness of a bootstrap procedure depends on the original sample being 
representative of the population. If the original sample is not representative, the conclusions drawn 
from the bootstrap methods will be completely inappropriate. 


Using the jackknife method, the size of resamples is confined to (n — 1), and the number of total 
possible samples is only n, the original sample size. The resampling strategy based on bootstrap 
has no such limitations in terms of the number and magnitude of replications possible. The only 
limitation comes from the computing resources, and these new sets of samples can be treated as a 
virtual population. 


—— re 


Example 13.3.1 
Suppose that the population distribution is an N (1, 07). Estimate o7. 


Solution 
Because we know the functional form of the distribution, we could use the estimation procedures discussed 
in Chapter 5. There is no need for the bootstrap method. These steps are as follows. 


Step 1. /f we have a random sample from N(1, a”) of size n use it. Otherwise, generate a random sam- 
ple Xj,..., Xn from N(u, a). This could be done using the method described in Project 4A of 
Chapter 4. 

Step 2. Estimate o* by using, the method of maximum likelihood, yielding 


Note that the maximum likelihood procedure requires the knowledge of the functional form of the 
distribution; see the derivation in Chapter 5. Suppose the form of the population distribution is not 
known but we do have a random sample X1,..., X, from a distribution. Now we will describe how 
we can estimate o* using the bootstrap method. 
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Let X1,..., Xn be a random sample from a probability distribution F with uw = E(X;) and o= 
Var(X;). Then the standard error of X is defined as o* /n. In general the population distribution F is 
unknown. A simple estimate of F is the empirical (or sample) cumulative distribution function defined by 


A < 
F(x) = ———— = Proportion of X/s < x. 
n 


This F is a step function with the size of the jump being 1/n at each ordered X;j. 


Now the bootstrap method of estimating the standard error of X could be summarized by the 
following steps. 


Step 1. Use the sample X,,..., X, and find F, the empirical cumulative distribution function 
of F. 

Step 2. Generate a sample {X%,, X*5,..., X*,} from F. From this sample, compute Xj. 

Step 3. Repeat step 2, (N — 1) times to obtain samples {X%,, X},,...., X},},i=1,2,...,N and 


find X>, X3,...., Xy. Now calculate X* = 4 3°", X;. This is the bootstrap mean. 


Step 4. Then the bootstrap estimate of Var(X), denoted by 6;., is given by 


Observe that once we have the subsample means X;,..., Xv, the formulas for calculating the boot- 
strap mean and bootstrap variance are the same as those for calculating the mean and variance of a 
given sample. 


Note that when F is taken to be the empirical cumulative distribution function, generating a sample 
from F is equivalent to generating a sample from {X1,..., X,} with replacement. As a result, we 
obtain the following algorithm. 


BOOTSTRAP ALGORITHM FOR ESTIMATING THE STANDARD ERROR OF X 
1. Draw N random samples with replacement from the original sample Xj, ...,Xn, with each 
observation having the same probability of being drawn (1/n). Let these bootstrap samples be 
denoted by {{X;,,Xj5,...,Xjp}, i = 1,2,...,N}. 
2. Calculate the sample means of each of these bootstrap samples and the overall sample mean by 


N 
Seo Se da 
ok ok aK 
j=l =I 


3. Compute 


4. Then the bootstrap estimate of Var (X) is ope or equivalently, the standard error of X is , ome 
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It is not necessary that the size of the bootstrap sample also must be n or the samples have to be 
obtained with replacement. However, it is suggested that the best results are obtained when the 
repeated samples are the same size n as the original sample and when the samples are obtained with 
replacement. The number of bootstrap samples N could be in the hundreds or more, depending only 
on the capacity of the software that we are using to generate these samples. 


I 
Example 13.3.2 
The following data represent the total ozone levels measured in Dobson units at randomly selected locations 
on Earth on a particular day. 


269 246 388 354 266 303 
295 259 274 249 271 254 


Generate N = 6 bootstrap samples of size 12 each and find the bootstrap mean and standard deviation 
(standard error). 


Solution 
Using Minitab (see Example 13.7.1 for the steps) we have created 200 bootstrap samples of size 12. we obtain 
the following summary results. 


X* = 285.74 


and 
6p, = 153.02 and 3p, = 12.37. 


Note that the mean of the original sample is 285.7, but the standard deviation is 43.9. Even though the means 
of the original sample and the bootstrap means are very close, their standard deviations are substantially 
different. 

| 


In real applications, one of the difficulties is to estimate the standard errors of more complicated 
statistics. We can now generalize the bootstrap method for those situations. Let 6 = 6(X1,..., Xn) 
be a sample statistic that estimates of the parameter 6 of an unknown distribution F using some 
procedure. We wish to estimate the standard error of @ using the bootstrap procedure, which is 
summarized next. 


GENERAL BOOTSTRAP PROCEDURE TO ESTIMATE THE STANDARD ERROR OF 6 
1. Draw N samples with replacement from the original sample, (X;, ...,Xn). Denote these 
bootstrap samples by {X;,,X75, .... X77}, i= 1,2, ...,N. 
2. Compute 61,62, ...,0y, where 
a = 6;(Xi1,Xj2, soonegin No 
The procedure for computing oF is the same procedure as that used to compute f) original sample 
X1, ...,Xn. Also, compute 


1 N 
6* = Ree 
f= 
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3. The bootstrap estimator of standard error (BSE) of 6 is given by 


It is clear that these algorithms are considerably computer intensive and it is necessary to have suitable 
software to implement them. The accuracy of the bootstrap approximation depends on the accuracy 
of F as an estimate of F and how large a bootstrap sample is used to estimate the standard error of 
6. We will leave the computation to Project 13A. We now give a theoretical example. 
ES sscss.c..c.c.ccijijijijijninnn gg 
Example 13.3.3 
Let X;,..., X, beasample from a Poisson distribution with parameter A. Let 


6 = P{X <l}=e*(1 4A). 


Obtain a bootstrap estimate of 6. 


Solution 
It can be shown that the maximum likelihood estimator of @ is 


Onl = eX (14%) 


In order to estimate the bias of 6, take N bootstrap samples from {X1,..., Xn}. Let 


‘ oe a #X'9 <1 
4 = ia 4x) - esd /— J 
n 
Then the bootstrap estimate of the bias of 6 is 
A 61 sb bate én 
9bias = =—— fy 


One might now use 
ei (1 + Xi) - Obias 


as an estimator of 6. 


13.3.1 Bootstrap Confidence Intervals 


We could use the repeated sampling method to construct bootstrap confidence intervals. We now 
give a procedure to obtain this. 


PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEAN 
1. Draw N samples (N will be in the hundreds, and if the software allows, in the thousands) from the 
original sample with replacement. 


668 CHAPTER 13 Empirical Methods 


2. For each of the samples, find the sample mean. 

3. Arrange these sample means in order of magnitude. 

4. To obtain, say, a 95% confidence interval, we will find the middle 95% of the sample means. For this, 
find the means at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position 
(0.025)(N + 1), and the 97.5th percentile will be at the position (0.975)(N + 1). If any of these 
numbers are not integers, round to the nearest integer. The values of these positions are the lower 
and upper limits of the 95% bootstrap interval for the true mean. 


It should be noted that every time we do this procedure, we may get a slightly different bootstrap 
interval. We now give an example. 


EE ——————=_=_—_=_=</_?$€#—!_—_————————————————_——_— 


Example 13.3.4 
For the data given in Example 13.3.2, obtain a 95% bootstrap confidence interval for jw. 


Solution 

We took N = 200 samples of size 12. Thus 0.025 x 201 = 5.025 © 5 and 0.975 x 201 = 195.975 © 196. 
Thus, taking the 5th and 196th values of sorted (in ascending order) sample means, we get the 95% bootstrap 
confidence interval for ju as 


(263.8, 311.5). 


1. Comparing the classical confidence interval we obtained in Example 6.3.3, which is (257.81, 313.59), 
the bootstrap confidence interval of Example 13.3.4 has smaller length, and thus less variability. In 
addition, we saw in Example 6.3.3 that the normality assumption necessary for the confidence interval 
there was suspect. In the bootstrap method, we did not have any distributional assumptions. 

2. Because the bootstrap methods are more in tune with nonparametric methods, sometimes it makes 
sense to obtain a confidence interval about the median rather than the mean. With a slight modifi- 
cation of the procedure that we have described for the bootstrap confidence interval for the mean, 


we can obtain the bootstrap confidence interval for the median. 
= 


PROCEDURE TO FIND BOOTSTRAP CONFIDENCE INTERVAL FOR THE MEDIAN 

1. Draw N samples (N will be in the hundreds, and if the software allows, in the thousands) from the 
original sample with replacement. 

2. For each of the samples, find the sample median. 

3. Arrange these sample medians in order of magnitude. 

4. To obtain, say, a 95% confidence interval we will find the middle 95% of the sample medians. For 
this, find the medians at the 2.5% and 97.5% quartile. The 2.5th percentile will be at the position 
(0.025)(N + 1), and the 97.5th percentile will be at the position (0.975)(N + 1). If any of these 
numbers are not integers, round to the nearest integer. The values of these positions are the lower 
and upper limits of the 95% bootstrap interval for the median. 
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In practice, how many bootstrap samples should be taken? The answer depends on two things: how 
much the result matters, and what type of computing power is available. In general, it is better to start 
with 1000 subsamples. With the computational power available now, even taking 10,000 replications 
is not much of a problem. There are many works in the literature on bootstrap hypothesis testing and 
regression. These are beyond the scope of this chapter. 


EXERCISES 13.3 


13.3.1. For the data of Exercise 13.2.2, generate N = 8 bootstrap samples of size 20 each and find 
the bootstrap mean and standard deviation (standard error). 


13.3.2. For the data of Exercise 13.2.5, generate N = 12 bootstrap samples of size 10 each and find 
the bootstrap mean and standard deviation (standard error). 


13.3.3. For the data of Exercise 13.3.3, obtain a 95% bootstrap confidence interval for jw. 


13.3.4. For the data of Exercise 13.2.6, (a) obtain a 95% bootstrap confidence interval for jz, and 
(b) obtain a 95% bootstrap confidence interval for the population median. 


13.3.5. For the data of Exercise 13.2.8, (a) obtain a 95% bootstrap confidence interval for jz, and 
(b) obtain a 95% bootstrap confidence interval for the population median. 


13.4 THE EXPECTATION MAXIMIZATION ALGORITHM 


In this section, we introduce an algorithm, called the expectation maximization (EM) algorithm that 
is widely used to compute maximum likelihood estimates when some elements of the data set are 
either missing or unobservable. In real-life problems, observing the complete data is the exception 
rather than the rule. For example, in lifetime studies, when n items are placed on a given test, we 
may have the failure times of only n1 <n items while for the rest of (n — n1) items we only know the 
censored failure time, that they survived a particular failure time T (fixed beforehand). For example, 
we may want to know whether the lifetime of a certain brand of fluorescent light bulbs is at least 24 
months. For this purpose, let us say we randomly test 100 light bulbs of this brand. In this case, our 
data will contain all the months within which the bulbs burned out, and some that survived for 24 
months. After 24 months, we may not follow when these bulbs burn out; all we know is that these 
bulbs lasted for 24 months. Such a data is an example of censored data. We can consider the censored 
failure times of (n — n) items as the unobservable data values. 


Another common problem is of missing data. For example, suppose we were to take a survey on some 
socioeconomic problems from a random sample of families from a city in 2000 and then a follow-up 
study on the same families in 2005. This may result in many missing values in the follow-up study, 
because it is possible that we may not be able to locate some of the families. Missing values can also 
occur if some of the respondents refuse to answer certain questions. We have seen in Section 5.3 that 
sometimes it is not possible to obtain closed-form solutions for MLE. In the completely observed 
case, there are other algorithms, such as Newton-Raphson, that can be used to numerically obtain the 
estimates. With missing values, those algorithms cannot be used. The name EM algorithm was coined 
by Dempster, Laird, and Rubin in 1977. This is a general iterative algorithm to obtain the MLE when 
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the data set is incomplete. The EM algorithm is a formalization of an intuitive idea of estimating 
parameters with missing data: (i) replace missing values with estimated values as true values, (ii) 
estimate parameters, (iii) repeat. 


Let X1,..., Xn, be then; observed data values, and let yi, ..., Yn—n, be the (n — m1) unobserved data 
values. Assume that X‘s are iid random variables with pdf f (x|@) and X‘s and Y/s are independent, 
that is, data are missing at random. 


We denote the random vector by X and the corresponding data vector by x. 


The joint pdf of X1,..., Xn, is represented by f(x|@), where @ is the parameter vector with values in 
© c R?,a p-dimensional Euclidean space. Let g(x, y|@) denote the pdf of the complete data set x 
and y, that is, the vector (x, y) represents the conceptualized complete data set. Let h(y|0, x) be the 
conditional pdf of the unobserved data y given 6 and the observed data x. The likelihood function 
for the observed data x is, by definition, 


L (6;x) = f (x|@). 
The likelihood function for the combined data (x, y) is again by definition given by 
Le (O:x; y) a g(x y \0). 


The problem is to find the maximum likelihood estimator that maximizes the likelihood function 
L(6, x), at the same time using L,(6; x, y). 


From the foregoing definitions, we have the conditional pdf of the missing (or unobserved) data y, 


given x: 
g(x, 19) 
h(y|0,x) = = 
(110. x) = Faia) 
or equivalently 
g(x,y 19) 
0) = ———_... 13.1 
fR| ny [8.x) (13.1) 


Let 09 € © be a given 6-value. Because hy |0, x) is a pdf, we have 


[Hy l00.x)ay =1. 


Thus, 
In L(@;x) = In L(0;x) f hy I60.x) ay 


= fo L(6;x)h(y |@9,x)dy (as In L(6;x)is independent of y). 
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Because L(6, x) = f(x |@), we have 
In L(6;x) = [1.F0010) h(y 10.) dy 
= [Paseyie) — Inh(y|, x)] A(y |60, x) dy (from (1)) 


= f ing(x.y 0) hy 160.x) dy — fin i(y lo.) nly 160. x) dy 
= Eg, [In g(x, y) |0] — Eg, [Inh(y |@, x)], (13.2) 


where the expectation is taken with respect to the conditional distribution of y given 99 and x. Let us 
now consider maximizing this with respect to 6. This maximization is the maximization step (M-step) 
in the EM algorithm. 


Let 69 be an initial estimate of 6. The choice of this initial value 6) could be done randomly or 
heuristically based on any prior knowledge about the optimal value of the parameter. For instance, 
suppose we have to estimate mean and variance of a normal distribution. One good starting point 
could be to take the sample mean ¥ and sample variance s? based on a subset of data containing no 
missing values. 


Let 
Q(6|69,x) = Eg, [In L-(6;x, y)] 
= Egy [In g(x, y|9)] 


Here, 09 is used only to compute the expectation; we should not substitute for 6 in the complete 
data log-likelihood. Let 81) be the maximizer that maximizes Q(6 |@9, x) with respect to 6. That is, 
26.1) |990,X) => O(6|O,x) for all 9 € ©. Then 611) is the first-step estimator of 6. Continuing the 
procedure we obtain a sequence of estimators 4(m), which under appropriate conditions converges to 
the maximum likelihood estimate with likelihood L,(6; x, y). 


STEPS FOR EXPECTATION MAXIMIZATION ALGORITHM 
1. n) is the estimate of the parameter 4 on the nth step. 
2. Expectation step (E-step). Compute 


Q(0)6(n) 0) = Ee, [In gtx: y 14)] 


where the expectation is with respect to the conditional pdf of y given An) and x (i.e., with respect 
to h(y|(n),X)). 
3. Maximization step (M-step). Find 6,41) € © such that 


6n41) = max Q(6 8m,x). 


4. Repeat until convergence criteria are met. 
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Thus, in the EM algorithm, each iteration involves two steps: the expectation step (E-step), followed by 
the maximization step (M-step). In the E-step, we find the conditional expectation of the unobserved 
or missing data given the observed data and the current estimated parameters. That is, the E-step 
constitutes the calculation of 


o( divs) =H, linet) 
= f ing(x.y'9) h(y |@n),x) dy 


(which is the sum if discrete), where the integration is over the range of values that y can take. The 
M-step constitutes maximization of O(4|An), x) with respect to 6. This procedure improves the log- 
likelihood at every iteration, that is, the log-likelihood is nondecreasing for every iteration. Thus, 
for the sequence (An) obtained through the EM algorithm, we have Lin41) 7x) > LOa) ;x) with 
equality holding if and only if O6in41) [¢n)s x)= Qn) \n)s x). When we have filled the completed 
data set, the parameter @ can be estimated by maximizing the log likelihood estimating procedure 
(M-step). It can be shown that under some conditions (such as that In f(x|@) is bounded, or that 
Q(6|90, x) is continuous in both @ and 6), 6m) converges in probability as n > oo to the maximum 
likelihood estimate based on the complete likelihood L.(6;x, y). 


For computational purposes, the E-step and M-step are alternated repeatedly until the difference 
LOnt1: x)-— LO), x) is less than 6, a small but prescribed quantity. Another possible convergence 
criterion is to stop the iteration when the distance between 6(n+1) and On becomes arbitrarily small. 
In practice, it may be necessary to run the EM algorithm a number of times with different (random) 
starting points to ensure that the global maximum is obtained. 


In general, the E-step and M-step could be complex. Even though the EM algorithm is applicable to 
any model, it is particularly effective if the data come from an exponential family. It turns out that, 
in this case, the log-likelihood is linear in the sufficient statistic for 6. For the E-step, simply compute 
the expectation of the complete data sufficient statistic given the observed data. By substituting the 
conditional expectations of the sufficient statistics computed in the E-step for the sufficient statistics 
that occurs in the expression obtained for the complete data maximum likelihood estimators of 6, 
we can obtain the next iterate in the M-step. Thus, when the complete data set is from an exponential 
family, both the E-step and the M-step are simplified. 


Let z = (x, y) be the complete observation vector. A particular case in which g(x, y|6) = g(z, 4) is 
from an exponential family: 


g(Z, 0) = a(x) exp{k’()t(x)}/c() 


where t(x) is a vector of sufficient statistics with complete data, k’(6) is a vector function of the 
parameter vector 0, and a(x) and c(6) are scalar functions. Recall that the members of the exponential 
family include many popular distributions, such as the normal, multivariate normal, Poisson, and 
multinomial distributions. In this case, the E-step can be written as 


O(6\(n).X) = Eo, [Ina(x) |x] + k’ @t(n) — Inc) 
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where tin) = Eg, [t(Z)|x] is an estimator of the sufficient statistic. The M-step maximizes the Q-function 
with respect to 6. Because Eg,,) [In a(x)|x] does not depend on 6, we can rewrite the steps as follows: 


E-step: Compute tin) = Eg, [t(Z)|x]. 
M-step: Find 8(n41) € © such that 


8(n-+1) = es [k’()t(n) —In c(0)]. 


The following example gives an EM algorithm for a special case of censored survival times. In the 
following example, the survival function is defined as the probability that an individual survives 
beyond time y, that is, S(y) = P(Y > y). 


—————OOOOOOOO::.0 nn aS ss nn — eee 
Example 13.4.1 
Let x = (x1, ...,xn,) be observed data and the censored observations at T are y = (1, .-., Ynz) (thatis, 
the survival time is at least T). Let the mean survival time be 6, and the probability density be given by 


f (x|0) = 67! exp(—x/0), x > 0. 


(a) Obtain the MLE, Oy. 

(b) Obtain an EM algorithm. 

(c) Consider the following censored data, which represent the number of years 20 patients survived 
after a major surgery, where a + symbol represents that we know only that this patient survived for 
4 years and no further information. 


44+ )12/12])1|}4+) 3 | 3 )}5]2)0 
5 | 1|4+/0); 3 | 13|13)1)0)]4 


Using the algorithm developed in part (b), run for 50 iterations with initial value of 69 being the observed 
sample mean, x, and with 69 = 0. Comment on the results. 


Solution 
The joint pdf of the uncensored observation, x, is 


ny 
f (x0) = x o(- S30) 


For the right censored (at T) observations y;, i= 1,...,n2, the pdf can be calculated as follows: 
CO 
K | —e *’’dy=1 
0 
T 


implies that K = e"/®. Thus, the pdf of y; is given by 


T/6 i 
h(x) = — e 9/9 — —e6T-)), yo 7 


0 
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(a) The likelihood, L¢(@, x, y), can also be written in the form 


ny 
1 —X£Gi/9) 
Le(0,x, y) = am ° = [l= #7? 


1 -~XGi/%) _ nr 
0 


= —e i=l e 
gn 
Thus, 
AY 
Di xi 
i=1 n2T 
In L-(6, x, y) = —n, Ind : 
0 0 
Differentiating with respect to 6, and equating to zero, 
ny 
a a; 
M1 | i=l n2 
—InL,(6, x, y) = + + =0 
ge pe Ye 
This implies 
ny 
nyg= ae +n2T 
i=1 
or 
P< n n 
6=—)*x,4+ 7 =x4+ 72 
ny ny ny 


Hence, the MLE is 


(b) Because g(x, y|@) denote the pdf of the complete data, and we assumed the pdf of all the data 
(censored or not) follows exponential distribution, we have 


ny ng 
1 -—XGi/*) 1 -Lyi/e 
g(x. y|9) = am e i=l Bas e i=l ; 
we get 
ny os n2 Yj 
L L 
In g(x, y|@) = —njIn@ » rae Ing dX of 
For the E-step of the EM algorithm, we first compute 


CO 


Eq Y¥ = e! /6 / yee Mody 
0 


= T + 99 (using the integration by parts). 
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So, we get 


Q(6|6o.x) = Eoy [g(x. y|4)] 
ny “3 nz . 
ex | ny Ind — Y)% — ny Ino 53 | 
i=1 


i=1 


ny 


ne 122 
n 1 In@ > 5 n2In@ - Yo Eo, (i) 
i i=1 


, 1 
n In , nyIn@ ~ =n2 (T + 6p) 


; T 6 
n, 1nd Fs n21n@ ia ua 


For the M-step, we differentiate Q(6|@9, x) with respect to 6, 


0 r) ee Xj n2T +n200 
ap 2 (9140. *) = =| n, Ino dX 5 nz 1n0 5 
i=1 
on 
A 2 "ng gtr nTety = 
6 62 f) 62 7 
AY 


[a1 +n2]0 =) xj +n2T +1760 
i=1 


1 a njT n 
2: 2 
y= ——_ y xj + + 60 
[ny +n] i=1 . [11 +n] [ny +n] 


A 


ny z+ n2T ‘ n2 4 
= x 0. 
[ny tnz] [ay t+n2] [21 +72] 


Thus, for the general n, the algorithm is 


ny a n2T n2 
x4 4 
[ny tn2] [ny +n2] [n1 +72 


8(n+1) = aa 


Now putting 041) = (k) = 0* in the previous equation and solving for 6*, we have that the EM 
sequence {6(4)} has the MLE OM as its unique limit point, as k — oo. That is, &* = OuL- 


(c) We used the following MATLAB code to run the algorithm with starting value 69 as the sample 
mean, that is 4.5. Here T = 4. We run it for 50 iterations. 
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A(1)=4.5 

frOrmhm=2e Bo 

NC SAA A 20) 3747/20 (3 20) AC) 
end 


Following is the output. 


4.5000 | 5.0235 | 5.1020 | 5.1138 | 5.1156 | 5.1158 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 


Thus 6 = 5.1159. 
To run with 89 = 0, in the previous code, just change A(1) = 0. We get the following output. 


0.0000 | 4.3485 | 5.0008 | 5.0986 | 5.1133 | 5.1155 
5.1158 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 | 5.1159 
5.1159 | 5.1159 


With 09 = X = 4.5, it took six iteration steps to converge, whereas with 69 = 0, it took seven steps to 
converge. Note that in both cases, 6=5.1159 = OL. 
[as 


Example 13.4.1 is a simple case, where there is no need for iterative computation of 6,y,. However, 
this demonstrates how the EM algorithm would work. These types of problems are abundant in the 
medical field. For example, we may be interested in the survival times ofn patients after a treatment. For 
practical reasons, we may be observing only for a fixed duration, such as 10 years. In Example 13.4.1, 
the vector x will represent the time of death for the n; individuals. For the remaining nz = n — ny 
individuals, the only data we have state that they survived for more than 4 years. Thus the value of 
T is 4. There is a possibility that during these experimental times, we may lose contact with some 
individuals, perhaps because they moved to some other place or they simply refused to participate in 
this experiment. In those cases, we will know only that the individual survived until we lost contact. 
This generalization of Example 13.4.1 to where the survival time data are different for each observation 
is given in Exercise 13.4.5. 
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We now give a similar example with a normal sample. 


nnn 
Example 13.4.2 
Let x = (x1, is Xn) be observed data from a normal population with mean 6 and variance 1. Let the 
censored observations at T be y = (y1,..., Ynz) (that is, the survival time is at least 7) from the same 
population. Assume that the two sets of observations {x;} and {y;} are independent. Write down an EM 
algorithm to estimate 0. 


Solution 
For the uncensored observed sample x1, ..., Xn , the likelihood function is 


1 a 
L(@|x) = fx (K|@) = -_qre I 
(v2) 
Furthermore, the complete likelihood for both the samples is 
2 1 = 2 
L(6 ) 1 —7 Li @i-9) 1 —7 Li Oi-9) (13.3) 
cl’ |X Y) = 7a P i=l 7A © i=] 4 ‘ 
(/2z) (V2z) 
From the definition of Q(6|@9, x), we obtain 
Q(6|00,x) = Egy [InLc (|x, y)] (13.4) 


where the expectation is taken with respect to the conditional pdf 


1 —~(y—8 2 9 1 
h(y |60, x, T) = ——e (y-O0)°/2_ 
Jin 1 — Fy (7.4) 
— 1 0-7/2 
J 20 1— (T— 6) 
where 
T T—O60 
Fur os) = / 6-H 2g = / te ep OG, 
; V2 J 20 
—0o —0Oo 
Thus, from (13.4) and (13.5), 
ny 2 2 
1 (xj—9) 1 (yj 9) 
669,x)=E In e 2 + Eg, In | ————-e 2 
Q(6|00.x) = Ee D | | or: 6 
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CO 
i-9" 
+n [ In es awe — x 1-60)?! a dy. 
4 ( (Xz) fe J 20 1— ®(T — 6) 


Now taking the derivative with respect to 6, 


mM e7 (9-80)? /2 


CO 

Le a ee | 

ie ae. + 5 | 0-9 eR r ay” 
T 


i=1 


n1 
= ; n2 
= mo+ ero) 69) —n2(6 — 6). 


a 
Solving 7 = 0, and letting n = n1 +n, we obtain 


ny 
Hi 
i= n2 n2®(T — 6) 
—_—— A 13.5 
pg are so) 
From (13.5), we obtain the EM algorithm as 
ny 
. ot . Se n20(T Om) 
On+1 = + Om + 
n n [= o(T = bn) 
where ® is the cumulative distribution function of a standard normal random variable. 
| 


We have seen that the incomplete data could occur as a result of missing data, or the complete data 
may contain variables that are not observable (hidden). The following is an example of the latter 
situation. 


(:22$$A$ 


Example 13.4.3 

Suppose that in a set of 1 twin pairs of children, n, are male twin pairs, nz are female twin pairs, and 
n3 =n — (n, +n2) are opposite-sex twin pairs. Let p be the probability that a twin pair is identical and g 
be the probability that a child is male. It is not known which pairs of same-sex twins are identical. Obtain an 
EM sequence for 6 = (p, q). 


Solution 

We haven = (nj +n2 +n3), and 6 = (p,q) is the parameter vector. Let x = (n1,n2,n3) be the observed 
data. Because we don’t know which pairs of the same sex are identical, postulate the complete data set as 
Z = (Ny,712,21,N22,N3), where ny, is the number of male identical pairs, nz, is the number of female 
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identical pairs, and ny2 and nz2 are the nonidentical pairs for males and females, respectively. Here, the 
complete data, z, has a multinomial distribution with the likelihood given by 


L(Z, 6) = f(2Z|A) 


= (' )eva [a - pa]? [pd —q)]" 


AY1,712,121,122,N3 
x [a= pa =@)]"” [20 = pa = @al” 


where the identical twins involve one choice of sex and the nonidentical twins involve two choices of sex. 
The log-likelihood for the complete data is 


In f(«|@) = (ny +721) In pt (112 +122 +13) INQ — p) 


+ (ny + 2n12 +73) Ing +t (nz1 + 2n22 +3) 
x In (1 — qg) + constant. 

For the E-step, use Bayes’ rule to obtain the following: 

P(k) Uk) 


(i) 
ny = E(nn z 
P(E) + (1 = Pay) (4) 


x, (Ky) = m1 


(1 = pay) (Ga)” 


(i) 
n = E(n12|x,0q@) =m 
= Pay dk + (A= Pay) (40) 


Pay (1-4) 
2 
Pay (1-49) + (1 — Pa) (1 - 209) 


(1 = pay) (1-409)? 
- 
Pe (1-4) + (1 - Pa) (1 - 209) 


k 
nW) = E(ny 


x, OK) = 2 


n= E(nza|x, 4) = 02 


Thus, the Q-function is given by 
k k k k 
0.84) = (nf-+ nl?) in 9+ (ni +n m3) Inn 


+ ety + an® + n3) Ing+ (no? + an) + n3) 


x In(1 — gq) + constant. 


It can be verified that the M-step gives the following: 


(k) (k) 
Sie = gael 
(k+1) 
_ n® + an) +73 
Uk+1) = ®, © 
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Substituting for the log-likelihoods by log-posteriors, the EM algorithm can also be used for compu- 
tations related to Bayesian analysis to find the posterior mode of @. In the context of incomplete data 
coming from mixtures of parametric families, the EM algorithm provides a very powerful numerical 
technique. In this book, we will not go into the mixture models. The steps necessary to compute 
the required quantities depend on the particular application, and thus in general how to code the 
EM algorithm is not clear. There are special cases available in some software packages such as SAS 
using PROC MI with EM option when the data come from a multivariate normal distribution. It is 
desirable to search the literature on the particular software you are using to find out the availability 
of “EM codes” to suit the particular application in which you are interested. Also, another difficulty 
with implementation of EM algorithm is that in each E-step, we require computation of the con- 
ditional expectation. To overcome this difficulty, Wei and Tanner in 1990 proposed an algorithm 
called MCEM (Monte Carlo EM) based on the Monte Carlo approach explained in Section 13.5. This 
basically involves simulating m variables, ¥,,..., Ym, from the conditional distribution hA(ylOn). x) 
and then maximizing the approximate complete data likelihood 


m 


0(414n), x) a - > [In g(x, y 10) ]. 


i=1 


We will not go into the details of these methods. The student may refer to Wei and Tanner's paper for 
further details. 


EXERCISES 13.4 


13.4.1. Suppose that Y is a noise-corrupted observation of a signal S. That is, Y = S$ + N, where S 
is independent of N. Assume that for a known a, N ~ N(0, 07) and S ~ N (0, 67), where 6 
is unknown. Given the observation Y = y: 


(a) Obtain the MLE, 6yz. 
(b) Obtain an EM algorithm. 


13.4.2. Let X1,..., Xn be an observed random sample and X(n,+41),...,Xn be the missing (at 
random) observations. Assume that X; are iid from an N (u, o?) distribution. 


(a) Show that (}77_, x7, 0, x?) are sufficient statistics for 6 = (1, 0”). 

(b) Obtain the EM sequence for 6 = (1, 07). 

(c) Consider a censored normal sample with n = 10, with the largest three being censored 
[Gupta]. 


1.613 1.644 1.663 1.732 1.740 1.763 1.778 


Using the results of part (a), obtain an EM estimate of 6 = (1, 0”) with an arbitrary starting 
point. 


13.4.3. In Example 13.4.3, suppose that qg is the probability that a child is a female. Obtain an EM 
sequence for 6 = (p, q). 
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13.4.4. Let x=(x1,...,%n,) and censored observations (xn,41,..-,%n) (that is, in the ith 
experiment, if i > n,, the survival time is at least y;). Let the new complete censored data y; 
be such that 


Xj, USN 


Yi» i> ny. 
Let the mean survival time be 6 and the probability density of y be 
f(y1@) =! exp(—y/6), y > 0 


and let the survival function be defined as the probability that an individual survives beyond 
time y, that is, S(y) = P(Y > y). Thus, 


S(y) = exp(—y/0), y > 0. 


(a) Obtain the MLE, 6yr. 
(b) Obtain an EM algorithm. 


13.4.5. Letx = (x1, oe Xng) be observed data and the censored observations be y = (v1, ae Yio) 
(that is, in the ith experiment, ifi > 11, the survival time is at least y;). Let the mean survival 
time be 6, and the probability density be given by 


Go=— Me)? 
S(x| = Fe en(-5 6-07), 


(a) Obtain the MLE, 6yr. 
(b) Obtain an EM algorithm. 


13.5 INTRODUCTION TO MARKOV CHAIN MONTE CARLO 


In this section, we give a brief introduction to Markov chain Monte Carlo (MCMC ) methods. Among 
the computational simulation methods, MCMC is enormously useful for realistic statistical modeling. 
Markov chain Monte Carlo methods were initially developed and used in physics. These methods 
have had profound influence on statistics over the past two decades, especially in Bayesian inference. 
MCMC methods are used to solve problems in many diverse areas such as archaeology, biology, 
biophysics, computational chemistry, computer graphics, finance, nuclear medicine, transport theory, 
and zoology. These methods have enabled researchers to exploit a degree of complexity and realism 
in modeling and analysis of problems in these areas that were previously beyond reach. The name 
Monte Carlo method was coined by Stan Ulam and John von Neumann, who introduced this method 
to solve neutron shielding and other related problems at Los Alamos in the early 1940s. 
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The popular MCMC procedures make use of two standard algorithms: the Metropolis algorithm, and 
the Gibbs sampler. In the Metropolis approach, all the parameters are varied at once. In the Gibbs 
method, each variable of the target pdf is changed one at a time. An improvement on Metropolis, 
called the Metropolis—Hastings algorithm, was introduced by Hastings in 1970. There are other hybrid 
methods, such as the Hamiltonian method that alternates between Gibbs and Metropolis procedures. 
In our present study, we will explain only the first three methods, namely, the Metropolis algorithm, 
the Metropolis—Hastings algorithm, and the Gibbs sampler. 


The objective of MCMC techniques is to generate random variables having certain distributions called 
target distributions with pdf (x). The simulation of standard distributions is readily available in many 
statistical software packages, such as Minitab. In cases where the functional form of (x) is not known, 
MCMC techniques become very useful. The basic idea of MCMC methods is to find a Markov chain 
with a stationary distribution that is the same as the desired probability distribution (x); this is the 
target distribution. Run the Markov chain for a long time (say, K iterations) and observe in which state 
the chain is after these K iterations. The probability that the chain is in state x will be approximately 
the same as the probability that the discrete random variable equals x. 


In Bayesian analysis, whether we are finding a posterior distribution or a Bayesian estimate (usually, 
the posterior mean), integration is involved. We know from calculus that obtaining closed-form solu- 
tions for integrations becomes almost impossible (too difficult) for all but some simple functions. 
A standard approach to numerical integration of a function f(x) is to first divide the range of integra- 
tion R into n segments x1, ..., X,, calculate the value of f(x) at each of these points f(x1),..., fn), 
multiply the values by the length of each segment, and sum these rectangles to approximate the 
integral, which is the area under the curve. The error in this approximation is reduced by increasing 
the number of segments n. 


In Monte Carlo integration, instead of taking x1, ..., x, as fixed deterministic numbers, we proceed to 
draw a random sample from a uniform distribution over the range of integration R, then evaluate 
f (x;) for each x; and take the average. This assumes that the range R is bounded. If R is not bounded, 
then f(x) can be integrated when it can be written as the product of another function A(x) and a 
distribution function x(x) from which we can draw values of x (that is, x1, ...,X, is drawn from the 
distribution z(x)). That is, 


[rear = [rconeoas 


where integration is over the range R. Then, the integral can be approximated with averaging the 
f (xj) values, that is, 


1 n 
/ f@dx © — Ze 


where we assume that x; values are arandom sample from z(x) and in the range R. When (x) is a stan- 
dard distribution, many statistical software packages, such as Minitab, can generate random samples 
from this distribution. In those cases, a general coding to evaluate this integral can be written as 
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sum = 0 
Fori=1ton 
{Draw x; from z(x) 
sum = sum + h(x;)} 
return sum/n 


In the preceding coding, by multiplying /(x;) by the indicator function of R (that is, Zr (x;) = 1, 
if x; € R, and zero otherwise), we can avoid the assumption that x; values are in the range R. For 
instance, let X;,..., X, be arandom sample generated from a target pdf, z(x). Then the expectation 
of any function f(X) can be estimated using the Monte Carlo method by 


ie _ 
Exf(X) = [ foomayde 9 fy) =F 
i=l 


where E, denotes the expectation with respect to the pdf (x). By the law of large numbers, it 
follows that 


SS hs —> Ex |f(X)] asn > co 
he 


provided Xj,...,X, are independent. We can verify that f is an unbiased estimate of E,f. In 
addition, the sampling distribution of f is approximately normal, with variance o7/n, where o? is 
estimated by 

2_1< 2 

SS ~ (fei — Ff) . 


i=1 


For example, in a Bayesian setting, an estimate of the posterior mean can be obtained by taking 
f(x) = x, and the variance can be obtained by taking f(x) = (x — x), if x(x) is the posterior 
distribution (recall that in Chapter 11, we used the notation z (0 |x) for the posterior distribution). 
Using the sampling distribution of f, we can also construct point and interval estimates for Ef. 


Observe that the heart of the Monte Carlo method is to obtain random samples from the target 
distribution z(x). One of the problems encountered using this approach is that, while it is easy to 
generate samples from standard distributions using popular statistical software packages, it is very 
difficult (sometimes not feasible) to do so from any distribution that is not standard (see Project 
4A for a method of generating random samples from a given distribution). For these reasons, the 
ordinary Monte Carlo method can be implemented in only a very few cases for Bayesian inference. 
That is where the Markov chain Monte Carlo method plays a crucial role. 


Using the MCMC methods, we will construct a Markov chain {X,,} with a limiting distribution as the 
target distribution, z(x). Let us first introduce the concept of Markov chains. For a brief description 
of Markov chains, refer to Appendix A2. We call a sequence of random variables {X,,} a Markov chain 
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(MC) with state space S if 


Py =i | Sa Shs SE SS) SP See St Sa) 


That is, the probability distribution of future states of a MC depends only on the present state and not 
on the past states. However, it is important to note that a Markov chain {X,,} is a dependent sequence 
of random variables; thus, the independence assumption inherent in a random sample cannot be 
used. The transition probability function of a discrete parameter Markov chain is defined as 


Pm,n(X, Y) = P(Xn = y\Xm =x), x, y in S. 


We simply denote this transition probability by p(x, y). When the number of elements in the state 
space S is finite, we can form a matrix P with the (x, y)th element being p(x, y). This matrix is called 
a one-step transition probability matrix. x(x) is called an invariant (limiting) distribution if it satisfies the 
equation 


n(x) = )) my) p(y, x). 
yeS 


We say that the chain satisfies the reversibility or detailed balanced condition if x(x) p(x, y) = m(y) p(y, x) 
holds for some z(.). It can be shown that such a x(x) that satisfies the reversibility condition is 
invariant. Basically, ifa Markov chain is reversible and its limiting distribution exists, then the limiting 
distribution is the invariant distribution. 


The results explained for discrete Markov chains can be extended to continuous time defined in a 
continuous state space. The stationary or the equilibrium distribution z(x) of a continuous Markov 
chain satisfies 


(x) = / P(y, x)(y)dy. 


Assume that the samples are generated from a Markov chain whose equilibrium distribution is the 
target distribution, (x). We know by the law of large numbers that 


1 n 
~ 2 S(Xi) > Exlf00] as n > 00 
i=1 
provided X1,..., X, are independent. It turns out that, if we generate a Markov chain X),..., Xn 
from the target distribution z(x), the result 


n 
“>> FOX) —> Ex[f(X)] as n > 00 

i=1 
still holds. In this sense, the chain {X;} resulting from an MCMC algorithm with stationary dis- 
tribution z is similar to the use of a random sample from z. The analytical details are beyond 
the scope of this book. Instead, we focus on the question: How do we construct a Markov chain 
whose stationary distribution is our target distribution, 2(x)? The answer is given by the Metropolis— 
Hastings algorithm, and the two special cases: the Metropolis algorithm, and the Gibbs sampler. 
A Markov chain Monte Carlo method for simulating a distribution z can be defined as any method 
that produces an ergodic Markov chain {X;} whose stationary distribution is 7. We start with the 
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Metropolis algorithm. Subsequently, we will explain both the Metropolis—Hastings algorithm and 
the Gibbs sampler. MCMC methods are increasingly being used for simulation of complex probability 
models, for computation of integrals, and optimization. 


13.5.1 Metropolis Algorithm 


One of the simplest algorithms in MCMC calculations is the Metropolis algorithm, introduced by 
the Greek-American mathematician Nicholas Constantine Metropolis and colleagues in 1953. This 
work was mentioned in Computing in Science and Engineering as being among the top 10 algorithms 
having the “greatest influence on the development and practice of science and engineering in the 20th 
century.” In this case, we make a trial perturbation from the current position in a parameter space by 
randomly selecting a trial step from a symmetric probability distribution called candidate-generating 
density or proposal density q(x, y) (in the discrete case, it is a symmetric matrix called the nominating 
matrix A = (aj;;), with aj; = aj). The q(x, y) depends only on the current state x and the new proposed 
state y (that is, g(x, y) = qx(y) is a function of the next proposed state y that is allowed to depend on 
the current state x). Thus, starting at x, g(x, y) can be regarded as the conditional density of landing 
at y in one transition step. The trial step is either accepted or rejected on the basis of the probability 
of the new position relative to the previous one. 


We now give the Metropolis algorithm for a discrete distribution. We want to obtain a sample from 
a distribution {z;}, where z(j) = P(Xx41 = j), and we have a symmetric nominating matrix A; then 
we can write the Metropolis algorithm in five steps as follows. 


METROPOLIS ALGORITHM (DISCRETE CASE) 
For k = 0, start with an arbitrary point, x, = i. 
1. Generate j from the probability distribution {aj,j = 1,2, ...}. 
2. Set 
m(j) 
= ay 
3. Ifr > 1setx,11 =/j (acceptance), 
otherwise generate u from Uniform (0, 1), 
ifu <rsetx,41 =j (acceptance), 
else xx4.1 = Xx (rejection); (note that the value of x, 4, becomes the next state). 
4, Setk =k +1,gotostep 1. 


Each of the accepted points is considered to be a sample value from the target distribution {z;}. 


The continuous case of the Metropolis algorithm is given next. 


METROPOLIS ALGORITHM (CONTINUOUS CASE) 
1. Start with an arbitrary point, xo. 
2. Select a new position x* = x, + Ax, where Ax is randomly chosen from a symmetric distribution. 
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3. Calculate the ratio 


where zr(x) is the target distribution. 
4. Accept the trial position, that is, set 


Mriq SI, TP S Ile 


Otherwise generate u from Uniform (0,1) 
Ifu <rset xx, =x* 
else set X¢41 = Xk. 

5. Setk =k + 1,go to step 2. 


If the proposal step size is dx, we could use the proposal distribution as U(—dx, dx); for example, 
if the step size is 1, then randomly choose Ax ~ U(—1, 1). For further discussion on selection of 
the proposal distribution, read Subsection 13.5.4. The Metropolis algorithm generates a set of states 
that is a Markov chain because each state x,41 depends only on the previous state x,. Using Markov 
chain techniques, it can be shown that the equilibrium distribution of the chain constructed by the 
Metropolis algorithm is indeed z(x*). Note that in the Metropolis algorithm, it is not necessary 
to have the pdf; instead, all that is necessary is to know the ratio 2(x*) /z(x,). Thus, none of the 
multiplicative constants in the pdf z plays a role in the algorithm. 


This algorithm works well in most applications. Following is a simple example to show how the 
Metropolis algorithm works. 


SSS I 
Example 13.5.1 
Using the Metropolis algorithm, generate a random sample from a Poisson distribution with mean i. For 
the nominating matrix, use the symmetric matrix with elements 


1/2, j=i-1 
a90 = 1/2, 47 = 41/2, fait 
0, otherwise. 


Solution 
The nominating probability matrix is a one-step transition matrix (see Appendix A2), 


1/2 1/2 0 0 0 
1/2 0 1/2 0 0 
0 1/2 0 1/2 O 
A=] 0 0 1/2 0 1/2 


Now we apply the Metropolis algorithm for generating samples from Poisson (A) in the following steps. 
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Step 1. Start with xn—, =i. 
Step 2. Generate j from A = {ajj}. How do we do it? We can do this using the following procedure: 


Fori £0, 

Generate u, from U(O, 1). 

lfu, = 5, set j=i+lelse set j=i-1. 
Fori=0, 

Ifuy < 5, set j= 0 

else set j = 1. 


Step 3. Set 
pa RO _ eM ad at 
mi) eA fil lah 
Set 
1, ifi=0,j=0 
r= 4, if j=i+1 
i, if j=i-1. 


Step 4. Acceptance/rejection: 


Ifr > 1, set xn = j (i.e, accept the new state/). 
Otherwise, generate uz from U(0, 1) 
ifuz <r,set xn = j (i.e, accept the new state) 


else set Xn = Xyn_1 (i.e, reject the new state j and keep the current statei). 


Step 5. Setn =n + 1, go to step 2. 
(sss) 


In Example 13.5.1, let us say we want to generate a random sample from Poisson with 4 = 2 and we 
are at state i = 3 in the iteration step (n — 1). If our proposed new state is j = 4, thenr = 2/4 = 1/2. 
Suppose we obtained the value of wu as 0.672772. Because this value is larger than 1/2, we reject the 
proposed new state 4 and stay at state 3 for the iteration step n (if you generate a new u2, your decision 
might be different). Instead, suppose our proposed step was j = 2; then r = i/A = 3/2 > 1, and we 
will immediately accept our new state as j = 2 (no need to generate a uniform random number; if 
you did, it would have been smaller than 3/2 anyway) for the iteration step n. 


een eee ee EE 
Example 13.5.2 
Let 2(x) =cexp (—f(x)) be the form of the target distribution function. Write a general Metropolis 
algorithm to generate a sample from z. 


Solution 
Let q(x, y) be any symmetric distribution. Starting from an arbitrary x9), we can write the Metropolis 
algorithm through the following steps. 
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Step 1. Let x4) be the current state. 


Step 2. Generate y from the distribution q(x, y). 


Because, 


_ my) _ cexp(-fQ)) _ 
"~ x(x@)  €exp(-F (x) exp(— (0) — Fw): 


calculate the change in f, Af = f(y) — f (x@)- 


Step 3. Generate a random number from the uniform distribution, U(O, 1). If u < exp(—Af), set 


X(t+1) = y (accept the proposed new state), otherwise set x(t41) = x(t) (reject the proposed new 
state). 


Step 4. Continue (i.e., go to step 1). 
[e225 


Note that in the previous example, the normalizing constant in z(x) is not important, because it 
cancels in the ratio. In fact this is true in all Metropolis and Metropolis—Hastings algorithms. In 
the special case, where q(x, y) = q (|y — x|), the Metropolis algorithm is also called the random-walk 
Metropolis. Another special choice is g(x, y) = q(y); this is called the independence sampler. In all 
of these cases, it is important to observe that whereas the target distribution is independent of the 
positions, the proposal functions depend on where we are. For example, let (x) be standard normal 
density, and let the proposal density be of the form 


(y — x)? 


The Figure 13.1 gives a representation of the target distribution and some representative proposals. For 
each point x of the target distribution, we generate a y from the corresponding proposal distribution. 
Then, according to the accept/reject rule that we specified earlier, we will make a decision whether to 
treat this new value y as being from the target distribution. 


13.5.2 The Metropolis—Hastings Algorithm 

The Metropolis—Hastings (M-H) algorithm is a generalization of the Metropolis algorithm, in which 
we need not assume symmetry of the nominating matrix A (or for proposal density q(x, y)). The 
acceptance probability is given by 


H(A ji i} 


adi, j) = min | are ; 


This algorithm is the basic building block of MCMC methods. The Metropolis—Hastings algorithm is 
widely used in applied statistics and is very useful for sampling from complicated, high-dimensional 
probability distributions. Now we present the steps involved in the Metropolis—Hastings algorithm 
in the discrete case. 
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j , / Ff \  \ Proposals 


W@ FIGURE 13.1 Target and proposal densities. 


METROPOLIS-HASTINGS ALGORITHM (DISCRETE CASE) 
For k = 0, start with an arbitrary point, x, = i. 
1. Generate j from the nominating distribution {aj, j = 1,2, ...}. 
2. Set 
_ aaj 
 a(i)ay 
3, lfr > 1setxp4) =j. 
Otherwise generate u from U(0, 1) 
ifu <r,setxx41 =j 
else set Xn = Xp_1. 
4, Setk =k+1,gotostep 1. 


In the preceding algorithm, if we calculate a(i, 7) = min{r, 1}, basically, we accept the proposed 
new step j if u < a(i, j); otherwise we stay at the current step i. The resulting Markov chain from 
both Metropolis and Metropolis—Hastings algorithms would have the transition probability matrices 
defined by 


Pi, j) = ajai, j) fori 4 j 
pid) =1—) aja (i, j). 
J#i 


In the continuous case, for any given (x), the Metropolis—Hastings algorithm takes the following 
form. To start the algorithm, we choose an arbitrary proposal distribution q(x, y) so that it is easy to 


690 CHAPTER 13 Empirical Methods 


obtain a sample from this distribution. Define the acceptance/rejection function as 


m (y) q(y, x) i} 


ae a mag y) 


If both z(x) and z(y) are zero, set a(x, y) = 0. 


METROPOLIS-HASTINGS ALGORITHM (CONTINUOUS CASE) 

Step 1. Start with an arbitrary point, xo. 

Step 2. Given a current state x), draw y from the proposal distribution q(x, y). 
Step 3. Drawu from U[O, 1]. 

Step 4. Ifu <a (xy ,setx'+1) — y, otherwise set x1) = x(0), 

Step 5. Sett =t+1,go to step 2. 


Note that if the q(x, y) is symmetric (i.e., g(x, y) = q(y, x)), then the Metropolis—Hastings algorithm 
reduces to the Metropolis algorithm. In practice, there are other forms of acceptance/rejection func- 
tions suggested. Observe that in the Metropolis—Hastings algorithm, as in the Metropolis algorithm, 
it is not necessary to have the pdf; instead, all that is necessary is to know the ratio m(y)/z(x). Thus, 
none of the multiplicative constants in the pdf, zr, plays a role in the algorithm. 


Because of the versatility of this method, there are many generalizations of the Metropolis—Hastings 
algorithm in the literature. It is also necessary to impose some conditions both on z and on the 
proposal distribution q for z to be the limiting distribution of the Markov chain {X} produced by 
the M-H algorithm. We do not wanta large ratio of the proposed new values to be rejected. Discussion 
of these issues is beyond the scope of this book. 


Lk: _—_——_.02222..?:.—.00°  .C.QX03° 
Example 13.5.3 
Using the Metropolis—Hastings algorithm, generate a sample from the following distribution. Let Q = 
{2,3,..., 11, 12}, which represents the sum of the up faces of two balanced dice, and let the distribution 
be given by 


Sumi} 2 3 4 5 6 7 8 9 10 | 11 | 12 
mi) | 1/36 | 2/36 | 3/36 | 4/36 | 5/36 | 6/36 | 5/36 | 4/36 | 3/36 | 2/36 | 1/36 


Using the nominating matrix 


1/2, j=i-1 
422 = 4(12)(12) = 1/2,a;j7 = 41/2, fSit1ijer 
0, otherwise 


write the M-H algorithm to generate samples from the distribution z. 


Solution 
Suppose we start with state i € Q, say at 5 (starting at any other state is ok). 
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Step 1. Generate j from the nominating distribution {ajj, j = 1, 2,...}. Thus, j =i—1 ori+1, and 
in this case j has to be 4 or 6. We can follow the same procedure as in Example 13.5.1 to choose 
between i— 1 andi+1. Let us say, we got j =i+ 1, here 6. 


(Ja jj us 
W) J" In this case, r = al ene ee a (if we had chosen 4, then, r = = =) 
1 


Step 2. Setr = On es 2 
<P = Dai} x5) 4/36. 4 


Step 3. [fr > 1 set xn = j. Here r > 1; hence, we accept the new state, xn = 6. Otherwise generate u 
from U(0, 1) ifu <r, set xn = j else set xn = Xy—1. 


Step 4. Setn =n-+ 1, and go to step 7. 
= 


ee 


Example 13.5.4 
Write a Metropolis—Hastings algorithm to generate samples from N(0, 1) based on the proposal U[—1, 1]. 


Solution 
Note that in order for y to be generated based on U[—1, 1], we need y— x) ~ U[—1, 1]. Thus, y ~ U 


[x = 1x9 4 1]. Figure 13.2 shows the target distribution as the standard normal, and the representative 


proposals that are uniform at points x = —2 and 2. 


Target 


Proposals 


W@ FIGURE 13.2 Normal target and uniform proposal distributions. 


Now, the M-H algorithm can be obtained in the following way. 
Set 


® .\_ 2. J tO)EQ, x) 
a(x.»)= min ae yy 


2 min | (exp {(»” -»)/}) a 7 7 
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Generate u ~ U[O, 1] 


fu <a (=, v), set x?+) = y, otherwise set x“) = x. Continue. 
| 


Observe that in order to generate normal random variables, it is not necessary to use M-H algorithms. 
Most of the statistical software packages will give us a random sample from the normal distribution. 
Example 6.5.2 (originally suggested by Hastings in 1970) is given for demonstration of the M-H 
algorithm. The algorithm is effective in general cases, for instance, to generate a sample from a gamma 
distribution. In Gamma(q, B), if w is an integer, we can use the method of Project 4A to generate a 
random sample. However, if a is not an integer, we could use Gamma([a], B) (here [a] denotes the 
integer part of a) as the proposal distribution, and follow the steps of the M-H algorithm to generate 
a sample from Gamma(a, B) (see Exercise 13.5.3). 


13.5.3 Gibbs Algorithm 


The name Gibbs algorithm (or Gibbs sampler) was coined by Geman and Geman in 1984. In the Gibbs 
sampler, only one parameter is varied at a time, while all others are held fixed. The parameter then 
is randomly drawn from a conditional probability density function, the probability distribution of 
one parameter, given all other parameters; z (x;|x_;), where x_; is the full set of parameters excluding 
only the single component x;. Let x = (%1,...,x,) be k(= 2)-dimensional. Recall from Chapter 3 
that these conditional densities can be obtained as follows: 


a (ote a =a (x igean Meters cited | 
= Rica Mi TEs RE ag) 
fea = eee 


The basic assumption under which the Gibbs algorithm works is that we could easily draw a random 
sample from these conditional pdfs. Thus, the Gibbs algorithm is a particular case of Metropolis— 
Hastings algorithms. For example, at the ith step, y; is generated from the nominating density q;(x;, y;) 
where q; depends on the current state x;. The candidate y; is accepted with probability 


Ti(Vi) ii» Xi) i} 


at; (Xj, ) = min| . 
ie mi (xj) qi (Xi; Yi) 


If y; is accepted, we will set the ith component of x, x»,; = y;; otherwise set x»; = Xn,;. The remaining 
components of x, are not changed in step i. This is repeated for each i, at the end of which the entire 
vector x, would have been updated. Thus, if we are in state x at time f, at time ¢ + 1 we either remain 
at x or go to y by modifying only one component of x. It is important to use the most recent values 
of updated components to update the next component. That is, given x) = ce iene xf) at time f, 
generate 


ge ~ («1 ees rae er x\) 


(t+1) 


1 
res ~ a (x2x0? @) a) 


pgs eu 
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x2) 


x) oS x8) 


x!) x1) updated 


x\°) updated 
> 
x 
W@ FIGURE 13.3 Gibbs updating procedure. 
SD ar (xa), 22, ?) 
a0? ~ (alae ae oe) i). 


For instance, let k = 2. The Gibbs sampler updates in the following manner. Start atx = (=\” ; x” )i 


first update i to a, using this updated value x and x; update ie to a resulting in the 


updated vector x). Repeat this procedure to obtain x®, x), .... Figure 13.3 depicts this updating 
procedure. 
The conditional densities f,..., f; are called the full conditionals. In the Gibbs sampler, only these 


conditional densities are needed for simulation. Thus, this procedure becomes very efficient when 
the vector x is large, because all of the simulations can be done as univariate. 


The following example of bivariate density is popularly used in the literature to illustrate the Gibbs 
sampler. It is the case where the joint density is complex, because one variable (x) is discrete, while the 
other variable () is continuous. However, the conditional densities are simple known distributions, 
binomial and beta distributions, respectively. It is then easier to simulate these distributions, thus 
demonstrating the power of the Gibbs sampler. 


FB 


Example 13.5.5 
(a) Write a Gibbs sampler for generating samples from the following bivariate density: 


f(x,y) = ( : aa — yy El for x=0,1,...,7 
x 
and0O<y<1l. 
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(b) Starting with yo = 1/4,n = 15,anda = 1, £ = 2, obtain the first three realizations of the Gibbs 
sequence. 


Solution 
(a) From Exercise 3.3.14, we know that 


fly) « (*)ra -y"™, 


That is, the conditional distribution of x (treating y as a constant) is binomial with parameters n 
and y, 0 < y < 1. Also, 


FODpay'? dsr er. 


Thus, the conditional distribution of y given x is a beta distribution with parameters x + a and 
n—x+ 8. The Gibbs sampler for generating bivariate samples from f(x, y) is then given as follows: 
Fori=1,...,n, repeat: 


1. Generate y; from frixC lx), that is from Beta (xj-1 +a@,n — xj-1 + B). 
2. Generate x; from fri (Ly), that is from binomial(n, yj). 
3. Return (xj, yj). 


(b) We proceed with the following steps. 
(i) For yo = 1/4, xg is obtained from generating a random variable from binomial with n = 15, 
yo = 1/4, that is, from B(15, 1/4), resulting in a value of 4 (generated using Minitab; you may 
get a different value when you do it). Thus, xo = 4, 
(ii) Generate y, randomly from 


Beta (x9 +a,n —x9 + B) = Beta(44+1,15—4+42) 


= Beta(5, 13) 
resulting in yy = 0.53 (approximated to second digit). Now x1 ~ B(15, 0.53), resulting in 
x,= 6. 
(iii) Generate yz randomly from 


Beta (x, +a,n — x, + B) = Beta(7, 11) 


resulting in yz = 0.30. Now x2 ~ B(15, 0.30), resulting in x2 = 3. 
Thus, a particular realization of the Gibbs sampler for the first three iterations is (4, 0.25), (6, 0.53), and 
(3, 0.30). 
= 


From Exercise 13.5.8, it can be observed that at the beginning, the values of the chain are highly 
dependent on the choice of the initial value yo. In practice, it is necessary to run a sufficient number 
of iterations to remove the effect of the starting values. Even though the Gibbs sampler is a special case 
of the Metropolis—Hastings algorithm, it is important to observe that unlike the M-H algorithm, every 
sample generated by the Gibbs algorithm is accepted. Also, we should have at least a two-dimensional 
problem for the Gibbs sampler to be used. 


From the previous discussions, we can see that a general description of am MCMC method can be 
summarized in the following algorithm. 
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Initialize Xo 
Fori = 1;...;N repeat 
x =X; 


Generate Y from a nominating density, q(x; y); 
Calculate the acceptance rate, a(x; y); 
Generate U from the uniform U(0; 1); 

If(U < a(x; y)) set Xi) = y, 

Else set X(j) = x; 

End; 


If we choose a nominating density g(x, y) and an acceptance rate a(x, y) such that the reversibility 
condition 


Te(x)a(x, y)q(x, y) = m(y)aly, x)q(y, x), 


is satisfied, then the foregoing procedure generates a Markov chain with limiting distribution z(x). 
In order to use Gibbs sampling for Bayesian analysis, we must have an explicit analytical posterior 
conditional distribution. 


13.5.4 MCMC Issues 


Two major issues in MCMC are convergence and burn-in. Because in all three MCMC algorithms 
we start the sequence from an arbitrary point, any particular sequence may take some time to pass 
through the transient stage, and the effect of the starting value is very small and can be ignored—that 
is, it attains convergence. In practice, we will have to run the algorithm for a few thousand iterations 
so that the effect of this initial state is negligible. The samples obtained during this burn-in period 
should be discarded for the subsequent analysis as they do not represent the target pdf. By monitoring 
the sequence itself, we can determine whether the sequence has reached the convergence. A simple 
way to decide how much burn-in is necessary is to create scatterplots of X; versus Xj, i # j. When 
the wild variations stop, then it is safe to assume that the chain has reached stationarity. 


Another major issue in the implementation of MCMC algorithms is the choice of proposal density. 
In the continuous case, popular choices among others are the multivariate normal density and the 
multivariate t with specified parameters. Even in these cases, there is the question of appropriate size 
of the spread, or scale of the proposal density. The size of the acceptance ratio is another issue. If 
the ratio is too small, the samples will get stuck (because almost all proposed new states will be 
rejected), and if the ratio is too high, the samples will show tracking. A general rule of thumb is that 
the acceptance ration should be within 30% to 60%. If not, adjust the step size (for a small ratio, 
decrease the step size, and for a high ratio, increase the step size). There are many pulications devoted 
to these issues. 


For the Bayesian computation, MCMC allows us to sample from any posterior. Because of the avail- 
ability of specialized software packages, such as BUGS, it is practical to code up for a particular 
problem. 
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There are many references including books on MCMC methods; some of these are listed in the 
references at the end of this book. A good nontechnical discussion on various aspects of MCMC can 
be found at http://www.amstat.org/publications/tas/kass.pdf. For a good discussion including some 
technical details, refer to.http://www.csss. washington.edu/Papers/wp9.pdf. 


EXERCISES 13.5 


13.5.1. For Example 13.5.1, letA = 3. Starting with initial state x9 = 6, compute relevant quantities 
performing 10 iterations of the algorithm. 


13.5.2. Using the Metropolis—Hastings algorithm, generate a random sample from a geometric 
distribution with mean 0. Use the nominating distribution {a;;, j = 1, 2,...} such that 
3 j=i-1,itl, andi=1,2,3,... 
aj=}5 j=0,landi=0 
0 otherwise. 


[Recall that if X is geometric with parameter 0, then P(X = x) =(1—86)*6, for 
x=0,1,2,...] 


13.5.3. Write down the Metropolis—Hastings algorithm to generate a sample from Gamma(q, B) 
using the proposal density as Gamma([a], [a]/a). 


13.5.4. Write down the Metropolis—Hastings algorithm for simulating a Markov chain with 
stationary distribution z = (1/6, 2/3, 1/6), using the “proposal” transition matrix 


ie Le © 
Q=|1/2 0 1/2 
0 1/2 1/2 


13.5.5. In tossing three fair coins, let the random variable X be defined as X = number of tails. 
Then the distribution of X is given by 


x 0 1 2 3 
(x) | 1/8 | 3/8 | 3/8 | 1/8 


Write down the Metropolis or Metropolis—Hastings algorithm for simulating a Markov 
chain with stationary distribution (x). Use any nominating matrix. 

13.5.6. Write a Metropolis algorithm to generate samples from a target distribution, z(x) « 
exp (- +), based on the proposal 


a _o-%x? 
ci ameiaes WT Te 
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13.5.7. Write a general Metropolis or Metropolis—Hastings algorithm to generate a sample from a 
target distribution z, where z is an exponential random variable with parameter 6. 


13.5.8. Write a general Metropolis or Metropolis—Hastings algorithm to generate a sample from 
a target distribution 2, where a(x) « x34(1 — x)38(2 + x)!*5. Use the proposal density as 
q(x, y) = 1 on the interval [0, 1]. 


13.5.9. For the bivariate density given in Example 13.5.5, starting with three different values of yo, 
say, 1/3, 1/2, and 2/3 n = 15, anda = 1, B = 2, obtain the first three realizations of the 
Gibbs sequence. Comment on the influence of the initial values. 


13.5.10. Consider a problem of sampling bivariate random variables with joint density given by 


ce byt 4xy) x> 0, y> (0) 
fy) = 


0, otherwise. 


(a) Find f (x|y) and f (y|x). 

(b) Write a Gibbs procedure to generate samples from this distribution. Discuss why it is 
easier to use the Gibbs sampler for this case. 

(c) Starting from an arbitrary point, obtain the first three sample points. 


13.5.11. Suppose the target distribution is 


wn-n((2) (08) 


Then write the Gibbs sampler to generate a sample from this distribution. In particular, 
say, we start with (X, Y) = (12, 12) and p = 0.7. What is the Gibbs procedure to generate 
a sample from a binormal distribution? 


13.5.12. Suppose the target distribution is 


wn-s((3)0 9) 


Then write the Gibbs sampler to generate a sample from this distribution. 


13.6 CHAPTER SUMMARY 


In this chapter, we introduced some empirical methods that are becoming increasingly popular in 
modern statistical analysis. The methods presented must be viewed as introductory in nature and 
by no means most efficient or general. Because of ever-evolving applications and advancements 
in technology, most of the methods presented here also evolve. Also, based on the situation, it is 
necessary to write computer codes to run the algorithms introduced in this chapter. Our hope is that 
students will explore these topics in more detail by referring to specialized books and publications. 


In this chapter, we also learned the following important concepts and procedures: 
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The jackknife method 

General bootstrap procedure to estimate the standard error of @ 
Bootstrap confidence intervals 

EM algorithm 

Markov chain Monte Carlo methods 

Metropolis algorithm 

Metropolis—Hastings algorithm 

Gibbs sampler 


13.7 COMPUTER EXAMPLES 


Most of the procedures described in this chapter could be implemented using Minitab, SAS, or SPSS 
only in special cases. For instance, there are special macros available to run Monte Carlo methods 
for each of these software packages. In SAS, we could use the so-called MI (multiple imputation) 
procedure for MCMC as well as EM methods if the data are multivariate normal. Unlike the many 
methods discussed in earlier chapters, in general there are no simple pull-down menus available to 
use the methods discussed in this chapter. 


There are other specialized programs that will do a good job of implementing the methods dis- 
cussed in this chapter. BUGS (Bayesian inference Using Gibbs Sampling) is free software that has 
proven to be effective in MCMC computations, and the details are at the Web site: http://www.mrc- 
bsu.cam.ac.uk/bugs/. Most of the procedures discussed in this chapter can also be implemented in 
“R,” which is also free software that can be downloaded from http://www.rproject.org/. 


$$$ 


Example 13.7.1 
For the data of Example 13.3.2, give the Minitab steps. 


Solution 
Enter the data in C1. Enter 0.08 (© 1/12) 12 times in C2. Then 


Calc > Random Data > Discrete... > Generate [ enter 200] rows of data > Store in column(s): 
enter C3-C14 > values in: enter C1 > Probabilities in: enter C2 > click OK 


We will get 200 rows of data stored in 12 columns. Because the data are generated randomly from the 
original data with replacement, we will consider the row data (C3-C14) as the sample size and the 200 
columns as the number of samples. Thus N = 200, and n = 12. Now for each row we can find the mean, 
X; by doing the following. 


Calc > Row Statistics... > click Mean > in Input variables: enter C3-C14 > store results in: enter 
C15 > click OK 


We will get 200 values representing the sample means. To get the bootstrap mean, 
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Stat > Basic Statistics > Display Descriptive Statistics... > Variables: enter C15 > click OK 


The value in the mean is the bootstrap mean, and the value in the standard deviation is the bootstrap 
standard deviation. 
If we want to get say, a 95% confidence interval, first sort the sample means in ascending order: 


Manip > Sort... > Sort column(s): enter C15 > store sorted column(s) in: enter C16 > sorted by 
column: enter C15 > click OK 


Calculate the values of 0.025 x (N + 1) = 0.025 x 201 = 5.025 and 0.975 x (N+ 1) = 0.975 x 201 = 
195.975. Approximating these values to the nearest integer, we get 5 and 196, respectively. The lower 
confidence limit will be the fifth entry in the sorted means, and upper confidence limit will be the 196th value 
in the sorted means. 

|| 


If we want to obtain a confidence interval for the median, we follow very much the same steps as 
before, but instead of using the mean in the procedure, we substitute the median. For example: 


Calc > Row Statistics... > click Median > in Input variables: enter C3-C14 > store results in: enter 
C15 > click OK 


The rest of the steps are similar. 


13.7.1 SAS Examples 


There are %JACK and %BOOT macros available to do jackknife and bootstrap computations. 
A good site with example programs from SAS institute is http://ftp.sas.com/techsup/download/ 
stat/jackboot.html. Sometimes, PROC IML could also be used to bootstrap. In the case of multi- 
variate normal data, PROC MI with the EM option will perform the EM algorithm in SAS; refer 
to http://support.sas.com/rnd/app/da/new/802ce/stat/chap9/index.htm for technical details. Refer 
to http://support.sas.com/rnd/app/da/new/802ce/stat/chap9/sect8.htm for a table that summarizes 
the options available for the MCMC statement. Example SAS codes could be obtained from a simple 
search of the Web for almost all the procedures explained in this chapter. 


PROJECTS FOR CHAPTER 13 


13A. Bootstrap Computation 


Use any statistical computer programs to generate random numbers. By specifying a particular dis- 
tribution, such as normal with mean 0 and variance 1 or other similar distributions, we can then 
generate numbers that follow this distribution. (This can be done either directly, if your software 
allows, or by the method described in Project 4A.) 
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(a) Use such a package to generate 200 numbers from an (0,1) distribution. Then calculate the 
sample mean and sample variance. (They will be slightly off from the actual mean and vari- 
ance. From this, we can draw the conclusion that the estimates of data parameters which are 
computed using the data set are not necessarily the true parameters, but often are reasonable 
guesses.) Using these values, calculate an estimate of the standard error. 

(b) Now for the same data, pretend that we are not really sure what the distribution is. Then, we 
could consider letting the observed data specify what the distribution is. This is the essence 
of bootstrapping. In particular, sample, with replacement from a distribution that we have 
observed (the empirical distribution of the data), in order to study the possible estimates that 
might have resulted from a similar sample (same data observations, but in possibly different 
quantities). Using the bootstrap algorithm described in Section 13.3, obtain a bootstrap 
estimate of the standard error and compare this with the estimate obtained in part (a). 
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Objective: In this chapter we discuss some general concepts and useful methods with applications 
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Florence Nightingale (1820-1910) is most remembered as a pioneer of nursing and a reformer 
of hospital sanitation methods. Her statistical contributions caused Karl Pearson to acknowledge 
Nightingale as a “prophetess” in the development of applied statistics. Nightingale used data as a 
tool for improving medical and surgical practices. During the Crimean War, she plotted the incidence 
of preventable deaths in the military and introduced polar-area charts to demonstrate the unneces- 
sary deaths due to unsanitary conditions. With her analysis, Florence Nightingale showed the need 
for reform and revolutionized the idea that social phenomena could be objectively measured and 
subjected to mathematical analysis. In addition, she developed a Model Hospital Statistical Form 
for hospitals to collect and generate data and statistics. She became a Fellow of the Royal Statistical 
Society in 1858 and an honorary member of the American Statistical Association in 1874. 


14.1 INTRODUCTION 


Basically, there can be three major problems in applying the statistical methods that we have studied in 
the previous chapters to real-world problems. These involve sources of bias, errors in methodology, and 
the interpretation of the analytical results. Bias occurs in situations or conditions that affect the validity 
of statistical results. In order for the statistical inferences to be valid, the observed sample must be 
representative of the target population, and the observed variables must conform to assumptions 
that underlie the statistical procedures to be used. Of course the statistical methodology chosen 
must be also appropriate for the problem under study. We must be careful with the interpretation of 
the statistical results. For example, in a regression problem, a cause-and-effect relationship may not 
be warranted, or in a hypothesis testing problem, we may not accept the null hypothesis, without 
exploring the probability of type II error. If we present the results graphically, the graphs should be 
accurate and should reflect the data variations clearly. 


In this textbook, we have assumed that a data set is available to us: Either it is a small data set that we 
can handle without much effort, or it is in a computer-readable file. In practical situations, the proper 
handling of a statistical data set is not an easy task. Going from a stack of disorganized hard copy to 
online data that are trustworthy, that is, to input, debug, and manipulate the data, is a problem one 
will face even before one starts the statistical analysis. Here, we will not be dealing with these issues. 
Interested readers should refer to the references at the end of this book for further study on these 
aspects. 


It is not our aim to discuss comprehensively all the problems that come up in applications. Most of 
the material presented in this chapter has already been discussed in various parts of the book. The 
purpose of this chapter is to present some methods in a unified way and to discuss generally the 
various ways in which the techniques developed in previous chapters could be applied to real-world 
data. Because the material in this chapter is a collection of available techniques, we will not follow 
the more rigorous pattern of previous chapters, and no proofs will be given. 


14.2 GRAPHICAL METHODS 


We first present some useful graphical methods that were not introduced in Chapter 1 on descriptive 
statistics. Graphical analysis is a very important aspect of any statistical study. Before attempting a 
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complex statistical analysis, summarize the data with a graph. Graphical displays of data analysis help 
in data exploration, analysis, and presentation and in communication of results. In data analysis, one 
of the significant steps is to summarize and plot the data. Graphs help in the communication of final 
results and recommendations inferred from quantitative models. A statistical model is often suggested 
by an initial graphical analysis. Adequacy of statistical models depends on the model conditions. 
Because the violations of these model assumptions may sometimes occur as nonlinearities, graphical 
methods provide an easy and perhaps very effective method of detection. Some examples of graphical 
displays are the histograms, dotplots, box plots, and scatterplots. Methods of graphing multivariate 
data are more complex and include scatterplot matrices, and icon plots. These are beyond the level 
of this book. 


If we have a data set with one variable (univariate), we first create a dotplot and summary of basic 
statistics. In a dotplot, we plot the data as dots (one dot for each observation) above the horizontal 
axis that covers the entire range of observations (see Figure 14.1). The dotplot will provide us with an 
idea of the distribution of the data and any unusual behavior of the data that may not be apparent 
from summary statistics such as mean, median, or standard deviation. The dotplots allow us to 
visualize the entire distribution of the data set by listing each possible outcome and the frequency of 
the variable. Other ways of summarizing univariate data, such as histograms, have been discussed in 
Chapter 1. The histogram differs from the dotplot in that it groups data into categories. We illustrate 
these problems with several examples. 


EOE EOE 
Example 14.2.1 
The following data give the lifetime of 30 light bulbs (rounded to nearest hour) of a particular type. 


1122 922 1146 1120 1079 905 1095 977 1138 966 
1150 977 1137 1088 1139 1055 1082 1053 1048 1132 
1088 996 1102 1028 1130 1002 990 1052 1116 1135 


Construct a dotplot. 


Solution 
Figure 14.1 is the dotplot for these data. 


T T T T T T 
910 945 980 1015 1050 1085 1120 1155 


W@ FIGURE 14.1 Dotplot for lifetime of light bulbs. 


The dotplot suggests a distribution that is skewed toward the right, because most of the observations are 
located to the right. 
= 


Some of the graphing methods can also be applied to compare two variables—for example, their 
frequency distributions. For instance, dotplots could also be used to compare bivariate (two variables) 
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Wi FIGURE 14.2 Scatterplot. 


or multivariate (many variables) data. When we have independent samples, side-by-side box plots could 
be used for comparing two sample distributions in terms of their centers, dispersions, and skewnesses. 


When there are two variables, a scatterplot is used as one of the basic graphic tools to examine the 
relationship between two variables. 


The scatterplot in Figure 14.2 for two variables x and y indicates a possible linear relation between x 
and y. The strength of the relationship between two variables is often represented through a correlation 
statistic. It should be noted that the correlation coefficient is a single number that is easy to calculate 
and comprehend, though it only measures the strength of a linear relationship and hence is often 
used as the primary statistic of interest. However, scatterplots provide information about the strength 
of association, not necessarily linear, between variables. In addition, scatterplots help us understand 
other aspects of the data, such as the range. Given n observations on two variables, X and Y, we plot 
a character or symbol at n points representing (x;, y;). If two or more observations in a scatterplot are 
identical, the plotted symbols will coincide, masking possibly important information. 


oor, 


Example 14.2.2 
The following data give the cholesterol levels before a certain treatment and after 4 months of the treatment. 


Before | 235 | 212 | 277 | 262 | 162 | 212 | 226 | 252 | 185 | 276 
216 | 315 | 289 | 283 | 234 | 223 | 275 | 282 | 311 | 285 
After | 233 | 214 | 200 | 266 | 146 | 212 | 238 | 284 | 191 | 247 
244 | 268 | 241 | 289 | 220 | 202 | 221 | 196 | 212 | 247 


Draw a scatterplot. Also find the correlation between before- and after-treatment values. 
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Solution 

Figure 14.3 is a scatterplot of the data. 

Looking at the scatterplot in Figure 14.3, we see a trend in the cholesterol levels before and after the treatment. 
Correlation of before- and after-treatment data is measured by r, where 


(xj — X) Gi — Y) 
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W@ FIGURE 14.3 Scatterplot for cholesterol levels. 
| 


The quantile-quantile (QQ) plot is another useful technique in comparing bivariate data. In a 
QQ plot, the quantiles of the two samples are plotted against each other. For two distributions 
that are almost the same, their quantiles would be nearly equal. As a result, the quantiles would plot 
along the 45-degree line. Deviation of plots from this line can be used to draw inferences about how 
the two samples differ from one another. If the two sample sizes n; and nz are equal, then we can 
draw the QQ plot by graphing the order statistics x(j) and yi) against each other. If the two samples 
are not of the same size, then we can use the following procedure to create the QQ plot. Ifn; >n2, 
then draw the (1/(n; + 1))th quantiles of the two samples against each other. For a large sample, 
they are the order statistics, x71) < ... < x(n,). For the smaller sample sizes, the pth quantile value is 
obtained by using the following formula: 


, Xp(n41)> if p(n + 1), is an integer 
= (14.1) 
X(m) + [p(t 1) — m] (x¢n41) — X¢m))> if p@ + 1), isa fraction 
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where m denotes the integer part of p(n + 1). It should be noted that a QQ plot is not useful for 
paired data because the same quantiles based on the ordered observations do not, in general, come 
from the same pair. 


oor, 


Example 14.2.3 
Draw a QQ plot for the data given in Example 14.2.2. 


Solution 
Here ny = nz = 20. First sort the data in ascending order. 


Before | 162 | 185 | 212 | 212 | 216 | 223 | 226 | 234 | 235 | 252 
262 | 275 | 276 | 277 | 282 | 283 | 285 | 289 | 311 | 315 
After | 146 | 191 | 196 | 200 | 202 | 212 | 212 | 214 | 220 | 221 
233 | 238 | 241 | 244 | 247 | 247 | 266 | 268 | 284 | 289 


Because the QQ plot points lie mostly below the 45-degree line, we may conjecture that the cholesterol level 
before is generally higher than that after. 
SS 
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W@ FIGURE 14.4 Q-0 plot for cholestrol levels. 
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We saw in Chapter 1 that box plots could be used for identification of outliers. To summarize, we 
emphasize that graphical procedures, although preliminary, are an integral part of any statistical 
analysis. 


EXERCISES 14.2 


14.2.1. In order to study any possible relationship between expense and return, the following data 
give percentage of expense ratio and total 1-year return for randomly selected stock mutual 
funds for the year 2000 (source: Money, February 2000). 


% expense | 1.03 | 1.80] 1.90 | 1.53 | 1.03 | 2.06 | 3.20 | 0.49 | 1.10 | 1.07 
ratio 


1.48 | 1.30 | 1.23 | 1.22 | 1.60 | 1.50 | 1.81 | 1.75 | 0.97 | 1.28 


% return | 7.3 | 9.5 | 32.2 | 11.0] 19.5] 7.3] 25.1} 10.2} 1.5 | 7.9 
18.9 | 26.1 | 3.4 | 3.7 | 23.5 | 2.9) 14.5 | 14.9 | 22.7 | 21.9 


Draw a scatterplot. Also find the sample correlation of percent expense ratio and percent 
return. 


14.2.2. In order to study any possible relationship between age and change in systolic blood pressure 
(BP) (mm Hg) in 24 hours in response to a treatment, the following data were obtained 
from 11 individuals. 


Age 70| 51] 65] 70| 48} 70} 45/|48)35]| 48]30 
Systolic 28)/—10|—8|—15}—8]—10);—12) 3] 1|—5] 5 
BP change 


(a) Draw a scatterplot. 

(b) Find the sample correlation of age and systolic BP. 
(c) Fit a least-squares regression line. 

(d) Interpret (a), (b), and (c). 


14.2.3. The following data represent 15 randomly selected state finances: revenue and expenditures 
(in millions of dollars) for the fiscal year 1997 (source: The World Almanac and Book of Facts 
2000). 


Revenue: 9,439 |8,845| 14,520] 24,028 | 39,038 | 5,215 | 20,128] 7,467 
26,538|5,537| 6,494 | 2,818 | 49,318 |4,229) 7,724 


Expenditure: | 5,722 |7,685 | 13,862 | 21,975 | 35,302 | 4,441 | 16,200|7,145 
25,791 |4,808)| 5,130 | 2,426 |39,296| 4,002} 6,818 


(a) Draw a scatterplot. 

(b) Find the sample correlation between revenue and expenditure. 
(c) Draw a QQ plot. 

(d) Interpret (a), (b), and (c). 
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14.2.4. The following data give birth rates (per 1000 population) for 20 selected states in 1998 
(source: The World Almanac and Book of Facts 2000). 


14.4 16.3 13.5 14.6 13.7 15.6 10.9 12.8 13.0 14.2 
13.4 13.9 15.9 13.3 14.1 15.7 15.2 13.9 15.4 11.3 


Construct a dotplot and interpret. 


14.2.5. The following data give the median prices (rounded to nearest $1000) of single-family 
homes for 18 randomly selected U.S. cities in 1998 (source: The World Almanac and Book of 
Facts 2000). 


128 146 109 90 105 152 79 89 109 
93 108 128 188 158 93 78 123 137 


Construct a dotplot and interpret. 


14.3 OUTLIERS 


All statistical procedures make assumptions about a population and the sample values obtained from 
the population. Before we proceed to analyze the data, we must check to see if there are any outliers, 
that is, data points that do not belong in the data set or are not in line with the rest of the data. 


Outliers are observations that appear to have an abnormal value as compared with the rest of the 
values in the data set; that is, the value of an outlier is either much higher or significantly lower than 
any other value in the data set. An outlier could be a discordant observation or a contaminant. A 
discordant observation is one that appears surprising or discrepant to the investigator and is to some 
extent subjective. A contaminant is an observation that is from a different distribution than the rest 
of the data. Outliers may occur as a result of some limitations on measuring techniques or recording 
errors. They may also be due to the sample not being entirely from the same population. Extreme 
values in a data set could also be due to a skewed population. It should be noted that sometimes 
a data point that is labeled as an outlier may really be indicative of a novel phenomenon. In these 
cases, an extreme observation may not be classified as an outlier. 


The presence of outliers can dramatically affect the estimate of the mean and variance of the sample, 
especially if the sample size is small. As a result, any test statistic computed from such data would be 
unreliable, and so would be the statistical inferences. For example, presence of outliers might lead to 
an incorrect conclusion that the variances of two samples are not equal if the outlier is the result of 
a recording or measurement error. 


In a controlled experiment, such as in a laboratory setting, good record keeping with a clear under- 
standing of the phenomenon under investigation and information about all the data will minimize 
the occurrence of outliers due to recording errors. 


There are basically two methods that are employed in dealing with outliers. One method is to use 
statistical testing procedures to detect outliers, possibly removing them from the data set and letting 
the analysis deal only with the rest of the data. The second method is to use statistical procedures that 
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are immune or only minimally sensitive to the presence of outliers. We now present some commonly 
used tests for labeling outliers. 


In data analysis, it is necessary to label suspected outliers for further study. For normally distributed 
data, we give three simple methods to identify an outlier: z-score, modified z-score, and box plot. 


In a z-test, first find the z-scores of the entire data set and label any observation with a z-score greater 
than 3 or less than —3 as an outlier. Recall that for the observed values x1,..., xX», the z-score is 
defined by 


where s is the sample standard deviation of the sample, that is, 


1 n 
= ._ x2 
s= pot i-®) ; 
i=1 
Because both the sample mean and the sample standard deviation are affected by the outliers, this 


labeling method is not very reliable. 


In a modified z-score test, the median of absolute deviation about the median (MAD) is used. Let 
MAD = median (|x; — m|) 


where m is the median of the observations. Then 


oes (4 — X) 
: MAD © 


An observation is labeled as an outlier if the corresponding modified z-score is greater than 3.5. A 
normal plot may be used for testing normality for the data. 


If we want a reasonably robust distribution-free test, an observation xo is labeled as an outlier if 


|xo — m| 


MAD = 5, 


Here, the choice of 5 is somewhat arbitrary. 


A box plot (also called box-and-whisker plot) gives a method of labeling outliers through a graphical 
representation. We have seen the method of construction of box plots in Chapter 1. A box plot consists 
of a box, whiskers, and outliers. We draw a line across the box at the median. For example, in Minitab, 
the bottom of the box is at the first quartile (Q1) and the top is at the third quartile (Q3). The whiskers 
are the lines that extend from the top and bottom of the box to the adjacent values, the lowest and 
highest observations still inside the region defined by the lower limit Q1 — 1.5(Q3 — Q1) and the 
upper limit Q1 + 1.5(Q3 — Q1). Outliers are points outside the lower and upper limits, plotted with 
asterisks (*). 
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Example 14.3.1 
The following data give the hours worked by 25 employees of a company in a randomly selected week. 


45 40 39 36 42 40 55 58 42 41 
48 50 47 54 40 34 18 40 60 56 
42 43 46 43 54 


Label all possible outliers using: 
(a) z-score test, distribution-free test, and modified z-score test. 
(b) Box plot. 

Solution 


(a) We can create Table 14.1, where dfree z stands for the distribution-free scores, and modified stands 
for the modified z-scores. 


Table 14.1 


Data z-Score dfreez Modified 


45 0.05355 0.12 0.12 
40 —0.50427 1.13 —1.13 
39 —0.61583 1.38 —1.38 
36 —0.95053 2.13 —2.13 
42 —0.28114 0.63 —0.63 
40 —0.50427 1.13 —1.13 
55 1.16919 2.62 2.62 
58 1.50389 3.75 3.37 
42 —0.28114 0.63 —0.63 
41 —0.39271 0.88 —0.88 
48 0.38824 0.87 0.87 
50 0.61137 1.37 1.37 
47 0.27668 0.62 0.62 
54 1.05763 2.37 2.37 
40 —0.50427 1.13 —1.13 


(continued) 
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Table 14.1 (continued) 

Data z-Score dfreez Modified 
34 —1.17366 2.63 —2.63 
18 —2.95868 6.63 —6.63 
40 —0.50427 1.13 1.13 
60 1.72701 3.87 3.87 
56 1.28076 2.87 2.87 
42 —0.28114 0.63 —0.63 
43 —0.16958 0.38 —0.38 
46 0.16512 0.37 0.37 
43 —0.16958 0.38 —0.38 
54 1.05763 2.37 2.37 


By the z-score test, there are no outliers. Using the distribution-free test, the 18 is the only outlier. By 
the modified z-score test, 18 and 60 are possible outliers. 
(b) The box plot is given in Figure 14.5. 


60 + 
50 + 
om 407 
30 + 


205 


W FIGURE 14.5 Box plot for hours of work per week. 


Hence the observation 18 is identified as an outlier using the box plot. 
= 


Once we identify the outliers, then the question is what to do with them. If we can rule out recording 
errors as the source of outliers, the situation becomes more difficult. It is often impossible to say 
whether an outlier is really an extreme value within a skewed population or whether it represents 
a value drawn from a different population. As we indicated earlier, an outlier can be a legitimate 
observation representing special feature of the sample population. In those cases, discarding the 
outliers may simplify the statistical analysis, although it also reduces the usefulness of such analysis. 
Understanding the experiment that generated the data might help in determining whether to discard 
or keep the outliers. 
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Once we decide to include the outliers, there are two possible ways to deal with them. One is to 
transform the data, such as by taking the natural logarithm, so as to reduce the undue influence of 
the outliers. Another possibility is to perform the analysis twice, with and without outliers, and report 
both results. 


If we have bivariate data, a scatterplot may reveal any possible outliers; see Figure 14.27. There are 
other methods available to detect multivariate data. 


EXERCISES 14.3 


14.3.1. Motor vehicle thefts are a big problem in cities. Table 14.3.1 displays data on motor vehicle 
thefts per 100,000 population in the year 1997 for 15 randomly selected large U.S. cities 
(source: Statistical Abstracts of the United States, 1999). 


Table 14.3.1 


Chicago, IL 1215.1 San Antonio, TX 830.0 
Columbus, OH 1109.9 — Charlotte, NC 780.1 
Nashville, TN 1536.5 Tucson, AZ 1403.3 
Albuquerque,NM 1797.8 Atlanta, GA 1869.7 


Sacramento, CA 1630.5 St. Louis, MO 2152.8 


Toledo, OH 939.7. Tampa, FL 1410.0 


Birmingham, AL 1219.7. Anchorage, AK 532.8 


Norfolk, VA 519.9 


Label all possible outliers using: 
(a) (i) z-Score test, (ii) distribution-free test, and (iii) modified z-score test. 
(b) Box plot. 


14.3.2. For the data of Example 14.2.1, label all possible outliers using: 
(a) (i) z-Score test, (ii) distribution-free test, and (iii) modified z-score test. 
(b) Box plot. 


14.3.3. The following data represent test scores of 36 randomly selected students from a large 
mathematics class. 


67 63 39 80 64 95 90 93 21 36 44 = 66 
100 66 72 34 78 66 68 98 74 81 71 100 
60 50 81 66 90 89 86 49 77 63 58 43 
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Label all possible outliers using: 
(a) (i) z-Score test, (ii) distribution-free test, and (iii) modified z-score test. 
(b) Box plot. 


14.3.4. The following data represent the number of days in 1997 on which selected U.S. metropoli- 
tan areas failed to meet acceptable air-quality standards at trend sites (source: The World 
Almanac and Book of Facts 2000). 


26 55 30 8 9 15 0 12 3 50 16 
47 0 63 3 O 19 23 3 32 15 20 
106 2 15 1 14 0 1 44 28 

Label all possible outliers using: 

(a) (i) z-Score test, (ii) distribution-free test, and (iii) modified z-score test. 

(b) Box plot. 


14.4 CHECKING ASSUMPTIONS 


With some exceptions, checking data for agreement with assumptions is not a topic that is strongly 
emphasized in other textbooks at this level. Even in more advanced books, this step is frequently omit- 
ted. In order for the inferences to work correctly, the measured variables must conform to assumptions 
that underlie the statistical procedures to be applied. In hypothesis testing such as the t-tests and 
ANOVA, we made some fundamental assumptions that the random samples need to satisfy for the 
tests to yield correct results. 


As an example the basic assumptions underlying a ¢-test are: 


(i) The sample comes from a normal population. 
(ii) The sample is random. In case of two sample tests (excluding paired tests), the measurements 
in one sample are independent of those in the other sample. 
(iii) When we are given two random samples, most of the results assume the equality of popu- 
lation variances, that is, of = 04. This assumption is called the homogeneity of variances. 
The test for equality of variance may have to be performed first if we doubt the equality of 
the variance. 


Likewise, analysis of variance is based on a model that requires the following three primary 
assumptions: 


(i) The samples come from normal populations. 
(ii) Each of the samples is randomly selected from each group, and the samples are independent 
of each other. 
(iii) The population variances for all the samples are equal. That is, if we have k populations with 


yatiances'o?,i = 1, 2,.,.,%, themoy =o} =...= 07. 


When we say we have a random sample, we implicitly assume that the data are identically distributed. 
The presence of outliers in an observed sample may affect such an assumption. We now explain a 
few tests for checking these assumptions such as the assumptions of normality, data transformations, 
and equality of variances. 
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14.4.1 Checking the Assumption of Normality 


We start with the assumption of normality. Let us consider the example of randomly selected scores 
of 28 calculus students. 


OOOO —_:”:.—n—n nn Eee 
Example 14.4.1 
Given in the following table are the test scores of 28 randomly selected students from a calculus 1 class. 

86 95 82 53 98 85 87 80 49 71 99 40 96 97 

94 89 69 23 72 76 78 91 96 77 77 91 35 47 


Construct a dotplot and a histogram, and compute the percentage of observations that fall in the intervals 
xX+5s,X+2s,andx + 3s. 


Solution 
The dotplot is shown in Figure 14.6. 


W FIGURE 14.6 Dotplot of student scores. 


The histogram is shown in Figure 14.7. 
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Wi FIGURE 14.7 Histogram for student scores. 


We have X = 71.18 and s = 20.99. Also, 57% of the random sample (i.e., 16 observations) fall in the interval 
71.18 + 20.99 = (50.19, 92.17). There are 27 observations, or about 96%, that fall in 71.18 + 41.98 = 
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(29.2, 113.16), and all the observations fall in 71.18 + 62.97 = (8.21, 134.94). This suggests that the data 
set is approximately normally distributed. This procedure is the empirical rule. 
= 


For the previous example, we have seen that the dotplot does not suggest any normality. A histogram 
also does not suggest any normality (see Figure 14.7). However, if we used the empirical rule as a test 
for normality, the data suggest normality. Clearly this leads to a conflicting situation with a simple 
theoretical check suggesting normality, while visual displays suggest nonnormality. In this case more 
sophisticated procedures are warranted. 


Sometimes, skewness and kurtosis can be used to test for tilt in and peakedness of a distribution. 
After getting skewness and kurtosis from the descriptive statistics, divide these by the standard errors. 
If both skew and kurtosis are within the +2 range, the data can be considered normal. 


We mention some sophisticated testing procedures for two of the most important of the parametric 
assumptions when running single-factor trials, namely, normality and homogeneity of variance. We 
have already seen in Project 4C how to construct a normal probability plot and to check for normality. 
In this chapter, we will use the Minitab normal plot to check for normality. Figure 14.8 graphs anormal 
probability plot (using Minitab) for Example 14.4.1. 


We see that the test scores follow the straight line on the normal probability plot pretty well. The 
serious departures occur for the last four scores, because the values fall well above the line. This 
suggests normality with possible outliers. 


Normal probability plot 


0.999 + 
0.99 + 
0.95 4 
0.80 + 
0.50 +~! 


Probability 


0.20 +--+ 
0.05 +~ 
0.01 4 
0.001 4! 


Average: 76.1756 Anderson Darling Normality Test 
Std Dev: 20.5979 A-Squared: 1.256 
N of data: 2.5 p-value: 0.002 


W@ FIGURE 14.8 Normal probability plot of student scores. 
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Normal probability plot 
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Std Dev: 91.7557 A-Squared: 0.510 
N of data: 2.0 p-value: 0.022 


Wi FIGURE 14.9 Normal probability plot for the lifetime of light bulbs. 


It should be noted that for skewed data, in the normal probability plot, positively skewed data fall 
below the straight line, whereas the negatively skewed data rise above the straight line. A normal 
probability plot for the lifetime of 30 light bulbs in Example 14.2.1 is given in Figure 14.9. 


This graph suggests that the data may not be normal and are more toward negatively skewed. 
Figure 14.10 is a normal probability plot for 30 data points generated from a standard normal 
distribution. 


In this textbook, we have presented only simple graphical tests for testing of normality. We should 
mention that in the literature, a variety of procedures for testing for normality are available, including 
the Kolmogorov-Smirnov test, the Shapiro—Wilks W test, and the Lilliefors test. Some of these tests 
are incorporated in statistical software packages such as Minitab and could be performed as easily 
as the graphical tests. If the sample size is very small, with any of these tests it may be difficult 
to detect assumption violations. It is important to keep in mind that these tests are only rough 
indicators of assumption violations. For small sample sizes, even when the tests show that none of 
the test assumptions is violated, a normality test may not have sufficient power to detect a significant 
departure from normality, though it is present. 


14.4.2 Data Transformation 


Data transformation uses mathematical operations (filters) on each of the observations, transform- 
ing the original scores into a new set of scores. An appropriate transformation may (i) reduce the 
influence of outliers, (ii) make data, from a nonnormal distribution, more normal, and/or (iii) make 
the variances of different data sets more homogeneous. Some of the more commonly used transfor- 
mations are (i) power transformations such as square root, (ii) logarithm, (iii) reciprocal, and (iv) 
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Average: 0.0567849 Anderson-Darling Normality Test 
Std Dev: 0.901967 A-Squared: 0.562 
N of data: 2.0 p-value: 0.124 


Wi FIGURE 14.10 Normal probability plot of data from a standard normal distribution. 


arcsine. Used correctly, data transformation can be a useful tool for the practitioner. Some of these 
transformations can be put into a popular class of transformations called the Box—Cox power law 
transformation 


where A can be optimally adjusted from 0 to 1. For example, as 4 — 0, we obtain the y = Inx 
(logarithmic filter) transformation, and when 4 = 1/2, we get the square root transformation. 


As we have seen in Project 9A, it is sometimes possible to use appropriate data transformations to 
transform nonnormal data into approximately normal data. Then we can use this normality property 
to perform statistical analysis on these transformed values. For instance, if the distribution of data 
has a long tail (which could be seen by drawing a histogram of observations) or a few laggards on 
the right (which could be seen by drawing a dotplot of observations), the ./x or In x transforms will 
pull larger values down further than they pull the smaller or center values. Sometimes it is necessary 
to try several different transformations (trial and error) in order to find one that is more appropriate. 


$$ 


Example 14.4.2 
Consider the following data from an experiment. 


1.15 3.84 0.01 2.06 3.28 2.61 0.59 3.19 1.32 1.07 
780 1.74 0.25 0.21 3.42 452 0.43 0.38 0.07 1.26 
4.03 7.28 085 3.24 0.62 
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(a) Drawa histogram and normal plot. 
(b) Take the transform y=./x and draw a histogram and normal plot for the transformed data. 


Solution 
(a) The histogram and normal plots for the data are shown in Figures 14.11 and 14.12. 


6-7 
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Wi FIGURE 14.11 A histogram of the data. 


Normal probability plot 
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Average: 2.21297 Anderson-Darling Normality Test 
Std Dev: 2.12252 A-Squared: 1.033 
N of data: 25 p-value: 0.003 


W FIGURE 14.12 Normal probability plot of the data. 


These graphs clearly show that the data do not follow a normal distribution. 
(b) The histogram and normal plot for the transformed data are shown in Figures 14.13 and 14.14. 
With this transformation (filter), we can see that the filtered data follow normality. 
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Wi FIGURE 14.13 Histogram of the transformed data. 
[zai 


We have only pointed out transformations in single-variable cases. The transformation methods are 
also useful in multivariable and multi-factor studies; however, these involve more difficult analysis. 


14.4.3 Test for Equality of Variances 


Now we discuss the tests for equality of variances, that is, the tests for heteroscedasticity. Our recom- 
mendation is that, in a real-world problem, after accounting for outliers one should conduct tests for 
normality and heterogeneity of variance routinely before analyzing any data. Here, we give two tests. 
One, for the two-sample case, is based on the F-test, and for the multisample case we give Levene’s test 
based on analysis of variance procedures. Albert Madansky’s book Prescriptions for Working Statisticians 
(Springer-Verlag, 1988) gives various other tests for normality and heteroscedasticity. 


(a) Testing Equality of Variances for Two Normal Populations 

The following procedure has already been discussed in the hypothesis testing chapter. For the sake of 
completeness, here we again briefly discuss this procedure. Let X11, ..., X1n, bearandom sample from 
an N({11, 02) distribution and X71,..., X2n, be a random sample from an N(j12, 03) distribution. 
Assume that the X4;’s and X2;‘s are independent of each other for all i, j. Let 


Assuming that jz; and 2 are unknown, we can test the hypothesis that of = 03 based on the ratio 


ny 


> 2 iy —%1)/(4 =1) 


°2 > (2;- X2)*/(ny -1) 


1 
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Normal probability plot 
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Average : 1.20944 Anderson-Darling Normality Test 
Std Dev : 0.7242 A-Squared: 0.289 
N of data: 25 p-value: 0.040 


Wi FIGURE 14.14 Normal probability plot of the transformed data. 


We know that (7; — 1)s}/o7 has a x(n; — 1) distribution and (nz — 1)s$/o% has a x?(n2 — 1) 
distribution. Therefore, under the null hypothesis Ho :o7 =05, the statistic F has an F(n, — 1,2 — 1) 
distribution. 


Based on the alternate hypothesis, we will reject the equality of variance assumption if the test statistic 
falls into the appropriate tail of the F-distribution. For example, if Hy: 07 > 0% with a= 0.05, we 
would reject Hp when F > Fo.95(n1 — 1,2 — 1), and if Hy: of < 0% with a=0.05, we would reject 
Ho when F < Fo95(n1 — 1,2 — 1). When H, : 07 4 03 with a = 0.05, we would reject Ho when 
F > Fo975(n1 —1,n2—1) or F < Foo25(n1 — 1,n2 — 1). It should be noted that in the case of a 
two-sided alternative, this procedure is not the best one in the sense of minimizing the type II error. 
However, for simplicity, we will not discuss the optimal two-tailed procedure. 


——————:?_ nh _—_— oe 
Example 14.4.3 
An aquaculture farm takes water from a stream and returns it after it has circulated through the fish tanks. 
Suppose the owner thinks that, because the water circulates rather quickly through the tank, there is little 
organic matter in the effluent. To find out, some samples of the water are taken at the intake and other 
samples are taken at the downstream outlet, and tests are performed for biochemical oxygen demand 
(BOD). If BOD increases, it can be said that the effluent contains more organic matter than the stream can 
handle. Table 14.2 gives the data for this problem. 


(a) Using normal plots, check for normality of each sample. 
(b) Test for the equality of variances of the BOD for the downstream and upstream samples ata = 0.05. 
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Solution 
(a) The normal plots are shown in Figures 14.15 and 14.16. 
The BOD data for the downstream and upstream samples are approximately normal. 


Table 14.2 
Upstream Downstream 
7.863 8.132 
5.714 9.128 
5.871 7.574 
6.479 8.678 
7.124 9.336 
7.539 8.798 
6.682 8.457 
5.877 9.756 
6.227 8.548 
6.771 7.992 
Upstream 
£2 
fa 
© 
Qa 
2 
o 
5.8 6.8 7.8 
C1 
Average : 6.6147 Anderson-Darling Normality Test 
Std Dev : 0.725827 A-Squared : 0.236 
N of data : 10 p-value : 0.716 


Wi FIGURE 14.15 Normal plot of upstream data. 
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Wi FIGURE 14.16 Normal plot of downstream data. 


(b) We test Ho soe =e; versus Ha nat rae We have nj =n2= 10, and a=0.05. Because the normal 

plots of each sample conform with the normality assumption, we can use the F-statistic: 

2 2 

Ss 0.729 

=t= = = 1.2425, 

85 (0.654) 
From the F-table, the rejection region is {F < Fo.925(9, 9) = 0.248} or {F > Fo.975(9, 9) = 4.03}. 
Because the observed value of the test statistic does not fall in the rejection region, we conclude 
based on the sample evidence that the variances of the two populations are equal. 


|| 
(b) Test for Equality of Variances, k > 2 Populations 
Generalizing to k populations, let Xj1, Xj2,..., Xinj, i = 1,2,...,k, be k random samples from 
N(u, a?) distributions, with both jz/s and o/s unknown. Also assume that X;;, Xi are independent 
for all (i, ), (k, J). We wish to test the hypothesis Ho : of = 03 =... = of against Ha: At least one 


of the o? is different. There are many tests available. One of the basic graphical procedures is to 
use a side-by-side box plots (see Example 10.3.1). We describe Levene’s test based on the analysis of 
variance (source: Levene, 1960). 


Let yj; = |xi; — Xj]. Now perform an analysis of variance test for equality of the means of the y;;. Let 


nj 


k ni k k 
n= >on, Y= > yy and y.=>> > vy/d mi. 
i=1 i=1 


j=l z=) 
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The analysis of variance statistic is 


u 2 
iOi — Ve) [(k 
2" _ ie /' : MST 
z= = ; 
Py S (vj — He) /o-% 


i=1j=1 


Recall that MST (mean square for treatments) and MSE (mean square error) were defined in Section 
10.3; the MST is a measure of the variability between the sample means of the groups and the MSE 
is a measure of variability within the groups. For a 95% confidence level, the rejection region is 
{z > Fo.95(k —1,n —k)}. 


It should be noted that the y;; is not independent, but the analysis of variance method is found to be 
robust against the deviation from this assumption of independence. 


—OOOOOOO nee... nn nnn OO ae 
Example 14.4.4 

The three random samples in Table 14.3 are independently obtained from three different normal 
populations. 


Table 14.3 
Sample 1 Sample 2 Sample 3 


64 56 81 
84 74 92 
75 69 84 
77 


At the w = 0.05 level of significance, test for equality of variances. 


Solution 
We test Ho : Oy = of = o3 versus Hg : Not all the or are equal. For this sample, x; = 76,x2 = 66.33, 
and x3 = 85.67. Also n=11, and k=3. Letting yjj = Loe — xj|, we obtain the following yj; values: 


10.33 | 4.67000 
7.67 | 6.33000 
2.67 | 1.67000 


BAB} A] ] Ol N 
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The test statistic is 


k 
Yn Ge - 7? /k-1) 


i=1 

k nj 

oe (4-H) /a-% 
i=1 j=1 


MST 5.5 
= SS 0.33. 
MSE 16.5 


i 


From the F-table, the 95% point is Fo.95(2, 8) = 4.46. Hence the rejection region is {z > 4.46}. Because 
the observed value of z = 0.33 does not fall in the rejection region, the null hypothesis is not rejected, and 
we conclude that the assumption of equality of variances seems to be justified. 

fe] 


Through our tests, if we find that the homogeneity of variance of the data is violated significantly, 
then nonparametric tests are more appropriate. Another popular test for equality of variance is 
Bartlett's test. 


14.4.4 Test of Independence 


Almost all the results in this book assume that we have independent random samples. In the sit- 
uation where we suspect that the sample data may not be independent, perform a run test as 
described in Project 12B to test for independence. There are parametric procedures available to test 
independence; however, the run test is independent of the distributional assumptions and simpler 
to perform. In general, whether the two samples are independent of each other is decided by the 
structure of the experiment from which they arise. In case of correlated samples, such as a set of 
pre- and posttest observations on the same subject that are not independent, a two-sample paired 
test may be more appropriate. Another popular method used to check for independence is the 
chi-squared test of independence; see Section 7.6.2. For time series data, the Durbin-Watson test 
(http://www.alchemygroup.net/Permutation%20Durbin-Watson%20Final.pdf) is effective. 


In practical sampling situations, the underlying populations are unlikely to be exactly normally 
distributed with homogeneity of variances. Both t-tests and ANOVA are robust for reasonable depar- 
tures in some of these assumptions. However, these tests may not be robust with respect to certain 
other assumption violations. For example, ANOVA is quite sensitive to the violation of independence 
assumption. These factors need to be given special attention in data analysis. 


EXERCISES 14.4 


14.4.1. The scores of 25 randomly selected students from a large calculus class are given below. 
47 73 90 22 68 86 94 32 88 86 


80 97 48 70 61 82 67 73 78 55 
63 59 42 46 90 
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(a) Test the data for normality. 
(b) Ifthe data are not normal, try a suitable transformation (filter) to make the transformed 
data normal. 


14.4.2. Referto Example 14.3.1. Suppose we use the transformation y; = In x; for each observation. 
(a) Test whether the transformed data are normal. 
(b) Determine whether the data value 18 is still an outlier in the transformed data set. 


14.4.3. The data shown in the following table relate to the concealed weapons permits issued in 
13 randomly selected Florida counties in 1996. 


31,603 20,873 15,963 10,294 8,956 7,901 6,820 
5,695 5,485 4,827 3,969 3,278 1,731 
(a) Test whether the data are normal. 
(b) If not, try a suitable transformation to make the transformed data normal. 


14.4.4. The following table represents a summary by state for Medicare enrollment (in thousands) 
for 15 randomly selected states in 1998 (source: Statistical Abstracts of the United States, 
1999). 
665 3,757 623 757 541 448 478 2,728 103 771 
224 86 623 1,373 713 
(a) Test to determine whether the data are normal. 
(b) If not, try a suitable transformation to make the transformed data approximately 
normal. 
(c) Test for outliers. If an observation is extreme, would you classify it as an outlier? 


14.4.5. Given in the following table are 15 randomly selected state expenditures (in millions of 
dollars) for the fiscal year 1997 (source: The World Almanac and Book of Facts 2000). 


5,722 7,685 13,862 21,975 35,302 4,441 16,200 25,791 
4,808 5,130 2,426 39,296 4,002 6,818 7,145 


(a) Test the data for normality. 
(b) Ifthe data are not normal, try a suitable transformation to make the transformed data 
approximately normal. 


14.4.6. For the data of Exercise 14.3.4, 
(a) Test whether the data are normal. 
(b) If not, try a suitable transformation to make the transformed data approximately 
normal. 


14.4.7. The following data give in-city mileage per gallon for 25 small and midsize cars (source: 
Money Magazine, March 2001). 


25 23 20 20 27 26 20 32 25 22 
24 21 28 20 22 19 21 29 23 32 
23 52 24 24 22 
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(a) Test to determine whether the data are normal. 

(b) If not, try a suitable transformation to make the transformed data approximately 
normal. 

(c) Test for outliers. If an observation is extreme, would you classify it as an outlier? 


14.4.8. The following table gives in-state tuition costs (in dollars) for 15 randomly selected colleges 
taken from a list of the 100 best values in public colleges (source: Kiplinger’s Magazine, 
October 2000). 
3788 4065 2196 7360 5212 4137 4060 3956 3975 7395 
4058 3683 3999 3156 4354 
(a) Test for outliers. 
(b) Test whether the data are normal. 


14.4.9. For the data of Exercise 14.2.1, test for equality of variances. 
14.4.10. For the data of Exercise 14.2.3, test for equality of variances. 


14.4.11. The following data represent a random sample of end-of-year bonuses for lower-level 
managerial personnel employed by a large firm. Bonuses are expressed in percentage of 
yearly salary. 


Female | 6.2); 9.2) 8.0] 7.7} 8.4/9.1] 7.4] 6.7 
Male | 8.9 | 10.0 | 9.4 | 8.8 | 12.0 | 9.9 | 11.7 | 9.8 


Test for equality of variances. State any assumptions you have made, and interpret your 
result. 


14.4.12. In an effort to investigate the premium charged by insurance companies for auto insurance, 
an agency randomly selects a few drivers who are insured by three different companies. 
These individuals have similar cars, driving records, and level of coverage. Table 14.4.1 
gives the premiums paid per 6 months by these drivers with these three companies. 


Table 14.4.1 
Company! Company Il Company lil 
396 348 378 
438 360 330 
336 522 294 
318 474 
432 


Test for equality of variances. State any assumptions you have made, and interpret your 
result. 
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14.4.13. Three classes in elementary statistics are taught by three different persons, a regular faculty 
member, a graduate teaching assistant, and an adjunct from outside the university. At the 
end of the semester, each student is given a standardized test. Five students are randomly 
picked from each of these classes, and their scores are as shown in Table 14.4.2. 


Table 14.4.2 

Faculty Teaching assistant Adjunct 
93 88 86 
61 90 56 
87 76 73 
75 82 90 
92 58 47 


Test for equality of variances. State any assumptions you have made, and interpret your 
result. 


14.5 MODELING ISSUES 


A model is a theoretical description in the language of mathematical statistics of a physical phe- 
nomenon. Even though interpretations can be developed by analogy, past experience, or intuition, 
the scientific approach requires a model for the phenomenon of interest. Models are simplifications 
(or approximations) of real-world situations and are designed to make it easier to identify and to 
understand relationships among variables. A good model is crucial for accurate estimation, forecast- 
ing, or predicting. If the observed data show a good fit to the estimates obtained through the model, 
we consider the model to be an adequate representation of the real-world phenomenon. If not, the 
model must be improved, to incorporate additional variables or modify the equations defining the 
relationships. In statistical modeling, it is important not to lose perspective on the essential purpose 
of the modeling effort. The emphasis should be on making these models work on real data sets in 
lieu of spending a large amount of time on the capabilities of the models. Even though the study of 
properties and abilities of models is important, equally important is an ability to know when and 
how to fit models to a particular data set. A regression line is a two-parameter model that depicts a 
linear dependence of one variable on another. Again, it is not our objective to discuss all the issues 
related to statistical modeling. We will only discuss briefly some simple issues relevant to modeling. 


14.5.1 A Simple Model for Univariate Data 

Suppose that we have a data set that characterizes a phenomenon of interest. Suppose our problem is 
to create a statistical model for the data set in the form of a probability distribution from which the 
data set came. First we create a dotplot and summary of the basic statistics. The dotplot will provide 
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us with an idea of the probability distribution of the data and any unusual behavior of the data that 
will not be apparent from the basic statistics such as sample mean, and sample standard deviation. 
Having identified the probability distribution of the sample statistic, we can proceed to obtain 95% 
confidence limits on parameters such as the mean and variance. In addition, we can obtain a 95% 
prediction interval of the next observation using the expression 


1 
y+ (t—value)s,/1+—. 
n 


Note that the prediction interval is always wider than the corresponding confidence interval. The 
confidence interval provides a measure of reliability for estimating a parameter. The prediction interval 
provides a measure of reliability for the prediction of an observation. Thus, the prediction interval 
needs to account for estimation error as well as the natural variability of a single observation. These 
steps can be considered as the first modeling effort for univariate data. Note that if we have a small 
sample size, using a t-value in the confidence interval and/or prediction interval supposes a modeling 
assumption of normality for the corresponding population. The preliminary verification of this is 
done by the dotplot. For more detailed verification of this modeling assumption, use the normal 
plots. 


es 


Example 14.5.1 
Consider the following data from an experiment: 


0.15 0.14 0.15 0.14 0.26 0.00 0.00 0.47 0.35 0.16 
0.15 0.15 0.23 0.13 0.19 015 0.22 0.53 0.17 0.23 
0.22 0.16 0.12 013 O11 O14 O18 O15 014 0.21 
0.13 0.12 013 013 0.21 022 0.18 0.20 0.22 0.16 
0.17 0.00 0.23 0.21 0.18 0.05 0.16 0.13 0.23 0.18 
0.14 0.29 0.21 0.22 0.11 0.16 0.23 0.13 0.07 0.17 
0.08 0.14 0.06 0.08 0.07 0.11 0.12 0.14 0.16 0.12 
0.10 0.27 0.19 0.13 0.27 0.16 0.07 0.09 0.04 0.53 
0.29 0.15 012 0.11 0.10 014 0.14 0.16 0.16 0.17 
0.36 0.46 1.21 0.39 0.01 052 0.09 0.18 0.16 0.16 
0.14 0.15 0.09 0.09 0.13 0.13 0.08 0.14 0.20 0.09 
0.09 0.16 0.08 0.10 0.34 0.24 0.15 0.44 0.08 0.08 
0.16 0.14 0.18 0.23 019 O11 O19 0.10 014 0.11 
0.14 0.17. 0.17 0.17 0.05 012 014 0.11 0.20 0.14 
0.23 0.03 0.10 0.29 013 0.26 0.13 0.15 0.27 0.14 
0.50 0.16 0.15 0.18 0.16 014 0.13 0.08 0.20 0.17 
0.17 0.16 0.15 0.11 0.13 0.76 0.18 0.19 0.09 0.12 
0.11 0.12 0.08 0.26 0.23 0.20 0.19 0.19 0.16 0.11 
0.12 0.13 0.32 0.05 018 012 0.13 0.50 0.13 0.04 
0.00 —0.11 0.18 0.15 014 0.15 0.02 0.20 


14.5 Modeling Issues 729 


(a) Obtain a dotplot. 

(b) Calculate the basic statistics, sample mean, sample median, and sample standard deviation. 
(c) Obtain a 95% confidence interval for the true mean. 

(d) Obtain a 95% prediction interval. 


Solution 
(a) Each dot in Figure 14.17 represents three points. 


0.00 0.18 0.36 0.54 0.72 0.90 1.08 


W@ FIGURE 14.17 Dotplot of the data. 


(b) We can use Minitab’s describe command to obtain the following. 


N MEAN MEDIAN TRMEAN  STDEV  SEMEAN 
Cl 198 0.17038 0.15121 0.15982 0.13610 0.00967 
MIN MAX Q) Q3 


—0.39575 1.22076 0.12059 0.19284 


(c) Again using Minitab commands, we can obtain (where data are stored in C1), 
MTB > Zinterval 95.0 0.136 cl. 


THE ASSUMED SIGMA = 0.136 
N MEAN STDEV SEMEAN 95.0 PERCENT Cll. 
C1 198 0.17038 0.13610 0.00967 (0.15143, 0.18933) 


(d) For the prediction interval use the large sample formula y + (Ze/2) sy/1+ 1, to obtain the 95% 
prediction interval for the true mean as (0.097, 0.4387). 
| 
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14.5.2 Modeling Bivariate Data 


When ascatterplot of bivariate data exhibits a linear pattern, the modeling is usually done using linear 
regression to study their linear relationship as explained in Chapter 8. Clearly a linear relationship 
is desirable because it is easy to interpret, departure from linearity is easy to detect, and predicting 
dependent values from independent variables is straightforward. However, when a scatterplot shows 
a curved nonlinear pattern, then finding a “good” model that fits the observed data may not be very 
easy. Sometimes, instead of fitting a curve we may be able to transform the data so as to make the 
scatterplots of the transformed data look more linear. 


A popular statistical method used to straighten a plot is the so-called power transformation. The 
power transformation is defined by specifying an exponent, k, which could be a positive or negative 
real number, then computing each transformed value as the original value to the power k. Note that 
k=1/2 gives the square root transform. When k = 0, every transformed value is equal to 1. Instead 
it is customary to think of k = 0 as corresponding to a logarithmic transformation so as to unify 
the transformation concept. The power k = 1 corresponds to no transformation at all. Observe that 
these are the same transformations we have explained in Subsection 14.4.2 to transform nonnormal 
data into normal transformed data. The shape of the scatterplots should suggest an appropriate 
transformation. The four curves in Figure 14.18 represent possible shapes of scatterplots that are 
usually encountered in practice. 


y y y y 


x x x x 
1 2 3 4 


Wi FIGURE 14.18 Possible shapes of a scatterplot. 


We can use the following as a general guideline for making transformations. If we have a scatterplot 
that looks like plot 1 of Figure 14.18, then to straighten the plot, we should use a power k < 1 for 
x (the independent variable) and/or use a power k > 1 for y (the dependent variable). Similarly, for 
curve 2,k > 1 for x and/ork <1 (suchas /y or In y) for y. For curve 3, take k > 1 for x (such as x* or x?) 
and/or k>1 for y. Finally, for curve 4, take k <1 for x and/or k>1 for y. Once we straighten the data 
through transformations, obtain the least-squares equation of the line as explained in Chapter 8. By 
reversing the transformation (or solving for y in the transformed equation) we can obtain the original 
nonlinear relationship between x and y. 


$< 


Example 14.5.2 
For the following bivariate data: 


2.4 | 2.6 | 3.1 | 3.6 | 4.1 | 4.2 | 4.6 | 4.7 
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(a) Draw a scatterplot. 
(b) Use appropriate transformation (if necessary) to linearize the scatterplot. 
(c) Fit the data to an appropriate curve. 


Solution 
(a) The scatterplot is shown in Figure 14.19. 


5.04 


4.55 


4.05 


3.55 


3.05 


2.575 


T 
0 5 10 15 20 25 
Wi FIGURE 14.19 Scatterplot of the data. 


This looks more like curve 4. 
(b) Let us use the transformation x’ = Inx and y' = y”. We will get the scatterplot shown in Figure 14.20. 


22.5 4 ; 
20.0 4 
17.54 . 
15.0 4 
12.54 7 
10.0 + 


7.55 


5.0 4 


1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25 


W@ FIGURE 14.20 Scatterplot of the transformed data. 


This looks more linear. 
(c) The regression line for the transformed data is y’ = 8.86x' — 6.96. Therefore, for the original data, 
y? = 8.86Inx — 6.96. The fitted curve is shown in Figure 14.21. 
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Looking at Figure 14.21, we can see that the data are only slightly nonlinear. In addition, using the equation, 
for a given value of x we can predict the value of the response variable y. For instance, if x = 1.5, we 
estimate y? to be —3.3676. 


as 
1-7 
a 
—1 = 
-—2- 

T T T 

1.4 2.4 3.4 

C3 


W@ FIGURE 14.21 Fitted curve. 


There are various other modeling issues that one may encounter in applications. For example, in 
multiple regression modeling, an investigator may have data on number of predictor variables that 
might be incorporated into a model. Some of these variables may be irrelevant or may duplicate 
the information provided by other variables. The problem then is how to detect and eliminate the 
duplicating variables. However, for the sake of brevity and level of presentation, we will not go into 
these issues of model selection. 


EXERCISES 14.5 


14.5.1. For the data of Exercise 14.4.5: 
(a) Obtain a dotplot. 
(b) Describe the data, such as mean, median, and standard deviation. 
(c) Obtain a 95% confidence interval for the mean. 
(d) Obtain a 95% prediction interval. 
(e) Explain your solutions and state any assumptions. 


14.5.2. For the gas mileage data of Exercise 14.4.7: 
(a) Obtain a dotplot. 
(b) Describe the data, such as mean, median, and standard deviation. 
(c) Obtain a 95% confidence interval for the mean. 
(d) Obtain a 95% prediction interval. 


14.5.3. The following represents the midterm and final exam scores for 35 randomly selected 
students from a large mathematics class. 
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Midterm: | 67 | 63 | 39 | 80 | 64 | 95 | 90 | 93 | 21 | 36 
44 | 66 | 66 | 72 | 34) 78 | 66 | 68 | 98 | 43 
74) 81 | 71 100 | 60 | 50 | 81 | 66 | 90 | 89 
86 | 49 | 77 | 63 | 58 
Final: 29 | 33 | 100 | 33 | 55 | 20} 10)5 | 67) 64 
71 | 25 | 34 | 66 | 28 | 34 | 16 | 27 | 32 | 20 
14/21 | 16 | 62 | 50) 14] 61 | 11 | 14} 41 
52 | 35 | 37 | 51 43 


(a) Draw a scatterplot. 
(b) Use appropriate transformation (if necessary) to linearize the scatterplot. 
(c) Fit the data to an appropriate curve and explain the usefulness. 


14.5.4. For the state finance data of Exercise 14.2.3: 
(a) Draw a scatterplot. 
(b) Fit a least-squares line. 
(c) Explain your solutions and state any assumptions. 


14.5.5. Table 14.5.1 gives in-state tuition costs (in dollars) and 4-year graduation rate (%) for 15 
randomly selected colleges taken from a list of the 100 best values in public colleges (source: 
Kiplinger’s Magazine, October 2000). 


Table 14.5.1 


In-state tuition: 3788 4065 2196 7360 5212 4137 4060 4354 


Graduation rate: 45 64 40 58 38 20 39 48 


In-state tuition: 3956 3975 7395 4058 3683 3999 3156 


Graduation rate: 40 20 45 39 39 20 9 48 


(a) Draw a scatterplot. 

(b) Fit a least-squares line and graph it. 

(c) Looking at the scatterplot of part (a), do you think the least-squares line is a good 
choice? Discuss. 


14.6 PARAMETRIC VERSUS NONPARAMETRIC ANALYSIS 


Up until Chapter 11, we basically assumed that random variables belong to specific probabil- 
ity distributions, such as a normal distribution or binomial distribution. The members of those 
distributions are associated by different parameters such as means or variances. Most of our efforts 
were concentrated on making some inferences about the unknown parameters. In this vein, we looked 
at point estimators, confidence intervals, and hypothesis testing problems. In practice the assumption 
that observations come from a particular family of distributions such as normal or exponential may 
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be quite sensible. As we have already mentioned, slight violations of these assumptions in many 
practical cases may not significantly affect statistical inferences. However, this is not always true. Fur- 
thermore, sometimes we may want to make inferences that have nothing to do with parameters. We 
may not even have precise measurement data, but only the rank order of observations. For example, 
if we want to study the performance of students at an institution, we may not have the precise scores 
the students obtained; instead we may only have their letter grades such as A, B, C, D, and F. Even if 
we have precise measurements, we may not be able to assume a distribution, such as normality. Still, 
we may be able to say that the distribution is symmetric, or skewed, or has some other characteristics. 
Basically, if there is doubt about the parametric assumptions, or the data are not suitable for para- 
metric inference, or we are not interested in inference about parameters, a nonparametric test that 
is valid under weaker assumptions is preferable. It should be noted that weaker assumptions do not 
mean that nonparametric methods are assumption free. The inference that can be made depends on 
valid assumptions that are made. 


When using nonparametric tests, acommon question is “Why substitute a set of nonnormal numbers, 
such as ranks, for the original data?” Rank tests are often useful in circumstances when we have no 
idea about the population distribution. We suspect that the data are not normal, and either we cannot 
transform the data to make them more normal, or we do not wish to do so. Few data are truly normal, 
despite the robustness of common parametric tests; unless we are quite sure that the nonnormality is 
a minor problem and would not affect the conclusions, we may often be better off using a rank test. 
However, there is a small penalty for using delete rank tests. If the original data are really normal, in 
the long run, the rank tests will be about 95.5% as efficient as a Student t-test would have been. This 
means that in such situations, the t-test will require about 95 samples compared to 100 for the rank 
test. But when data are far from normal, the rank tests will require fewer samples than the f-test; in 
fact, we should not use the t-test in such cases. 


Basically, if we know the distribution of the underlying population, we can use parametric tests. 
Otherwise, for a given data set, we first perform the normality test as explained in Section 14.3. If 
normality fails, in general, we can use nonparametric methods for data analysis. 


Another situation in which we can use nonparametric tests is when the data contain some outliers. A 
box plot or a normal plot, as explained in Section 14.3, will reveal the existence of outliers. However, 
in many applied areas such as in most bioavailability data, there will appear to be outliers. It is not 
feasible to determine whether these are skewed or contaminated distributions. They are not errors. 
In those situations, a conservative approach will be to use nonparametric methods. For example, 
because the statistic for the rank sum test is resistant to outliers, it will not be seriously affected by 
the presence of outliers unless the number of outliers becomes large relative to the sample size. 


It should be noted that we ought to be careful even when we use nonparametric tests. For example, 
if the data for one or both of the samples to be analyzed by a rank sum test come from a population 
whose distribution violates the assumption that the distributional shapes are the same, then the rank 
sum test on the original data may provide misleading results or may not be the most powerful test 
available. Transforming the data (for example, a logarithmic transformation pulls in long tails) to 
obtain normality and then performing a two-sample t-test, or using another nonparametric test, may 
be more appropriate for the analysis. In general, nonparametric methods are appropriate when the 
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sample sizes are small. When the data set is large, say n > 100, it often makes little sense to use 
nonparametric statistics. 


Finally, we must conclude that we do not perform nonparametric tests on a given set of data unless it is 
necessary, that is, if we cannot assume a classical probability distribution that characterizes the given 
data. Also, parametric statistical analysis is, in general, more powerful than the nonparametric analysis. 
We will end this section with a quote from W. J. Conover: “Nonparametric methods use approx- 
imate solutions to exact problems, while parametric methods use exact solutions to approximate 
problems.” 


EXERCISES 14.6 


14.6.1. Consider the following data. 
0.01 0.012 0.016 0.018 0.036 0.042 0.036 0.048 
0.072 0.042 0.22 0.096 0.76 0.055 0.13 0.016 


(a) Test for normality and comment whether a parametric or nonparametric test is 
appropriate. 

(b) Try a suitable transformation (filter) to make the transformed data normal, if possible, 
and then use a parametric procedure. 


14.6.2. For the Medicare data of Exercise 14.4.4, if parametric procedures are not appropriate, use 
a nonparametric procedure. 


14.7 TYING IT ALL TOGETHER 


Now we will give some real data on which we will use standard methods to analyze the given data. 
Software reliability is a major aspect in any kind of software development. One of the ways to do 
this is to observe time to failure and/or time between failure (TBF). If the defects are fixed, we would 
expect, on average, the TBF to increase. Based on that data, one studies reliability of the software. There 
are a variety of methods to analyze the software reliability problems. Here we will not dwell on the 
reliability issues. We will only do some simple data analysis on a set of software failure data. The fol- 
lowing data represent software failure times in the Apollo 8 software system. They were obtained 
from www.dacs.dtic.mil/databases/sled/swrel.shtml. It is assumed that these failure times are 
random. 


—_—_—_—_— 
Example 14.7.1 
The following data set consists of 26 software failure times taken from testing of the Apollo 8 software 
system. 


i 9 21 #32 36 43 45 50 58 63 
70 71 77 78 87 91 92 95 98 
104 105 116 149 156 247 249 250 
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TBF: 9 12 11 4 7 #2 5 8 
7 1 6 19 4 1 3 
6 1 11 33 7 91 2 =«1 


(a) Obtain a dotplot and describe the TBF data. 

(b) Identify any outliers and test for normality with and without outliers for TBF data. If the data are not 
normal, does any simple transformation make the data normal? 

(c) Obtain a 95% confidence interval for TBF. 

(d) For estimation problems, does a parametric or nonparametric method seem more appropriate for 
the data? 

(e) Obtain a scatterplot between T and TBF and discuss its usefulness. 


Solution 
(a) The dotplot for the TBF data is shown in Figure 14.22. 


24 36 48 60 72 84 


W@ FIGURE 14.22 Dotplot of TBF data. 


The following is the result of the describe command from Minitab. 


TBF N MEAN MEDIAN TRMEAN STDEV SEMEAN 


26 9.62 5.50 6.58 17.79 3.49 
TBF MIN MAX Q) Q3 
1.00 91.00 2.00 9.00 


(b) We will use the box plot shown in Figure 14.23 to identify the outliers. 
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W@ FIGURE 14.23 Box plot of TBF data. 


From the box plot the observations 33 and 91 are outliers. 
Figures 14.24 and 14.25 show the normal plots with and without outliers. 
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Normal plot with outliers 


Probability 
a 
Oo 


TBF 


Average: 9.61539 Anderson-Darling Normality Test 
Std Dev: 17.7878 A-Squared: 5.075 
N of data: 26 p-value: 0.000 


Wi FIGURE 14.24 Normal probability plot of TBF data with outliers. 


Normal plot without outliers 
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Average: 5.25 Anderson-Darling Normality Test 
Std Dev: 3.50466 A-Squared: 0.504 
N of data: 24 p-value: 0.184 


W@ FIGURE 14.25 Normal probability plot of TBF data without outliers. 
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Normal plot of In (TBF) 


a ee —— a 


Probability 


0 1 2 3 4 
C4 
Average: 1.56762 Anderson-Darling Normality Test 
Std Dev: 1.10478 A-Squared: 0.576 
N of data: 26 p-value: 0.121 


Wi FIGURE 14.26 Normal probability plot of transformed TBF data with outliers. 


It is clear that the data with outliers are not normal, whereas if we remove the outliers, the data 


become normal. 
Figure 14.26 gives the normal plot by taking the natural log of the TBF data with outliers. The figure 


shows that the data become approximately normal. 

(c) It is clear that to obtain a small sample confidence interval, to satisfy the assumption of normality, we 
need to take the data without the outliers. Hence a 95% confidence interval for TBF with the outliers 
removed is (3.77, 6.73). Running a nonparametric Wilcoxon test in Minitab for the 95% confidence 


interval with outliers gave the following. 


ESTIMATED ACHIEVED 
TBF N MEDIAN CONFIDENCE CONFIDENCE INTERVAL 
26 6.00 94.9 (4.00, 8.00) 


(d) /f we are analyzing the data without outliers or the log-transformed data, parametric methods are 
better. With the original data, because the normality assumption may not be appropriate, we need 


to use nonparametric methods. 


(e) Figure 14.27 gives the scatterplot of T and TBF. 
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W@ FIGURE 14.27 Scatterplot of T and TBF. 
jie) 
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Example 14.7.2 
Table 14.4 gives dealer cost and sticker price for four-door base models of 25 small and midsize cars (source: 
Money Magazine, March 2001). 
(a) Obtain a dotplot and describe the sticker price data. 
(b) Identify any outliers and test for normality with and without outliers for sticker price data. If the 
data are not normal, does any simple transformation make the data normal? 
(c) Obtain a 95% confidence interval for sticker price. 
(d) For estimation problems, do parametric or nonparametric methods seem more appropriate for the 
data? 
(e) Obtain a scatterplot between dealer cost and sticker price. 
(f) Fit a least-squares regression line and run a residual model diagnostic using Minitab. 


Table 14.4 

Model Dealer cost Sticker price 
(in dollars) (in dollars) 

Acura Integra GS 19,479 21,600 

Chevy Cavalier 12,398 13,260 

Chevy Impala LS 21,251 23,225 

Chrysler Concord LX 20,834 22,510 


(continued) 
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Table 14.4 (continued) 
Model Dealer cost Sticker price 
(in dollars) (in dollars) 

Dodge Neon SE 11,856 12,715 
Ford Escort 12,277 12,970 
Ford Taurus SE 17,606 19,035 
Honda Civic DX 11,723 12,960 
Honda Accord 2.3 LX 16,727 18,790 
Hyundai Sonata 13,805 14,999 
Kia Sephia 9,914 10,595 
Mazda 626 LX V6 18,181 19,935 
Mitsubishi Mirage ES 12,534 13,627 
Mercury Sable GS 17,777 19,185 
Nissan Maxima GXE 19,430 21,249 
Oldsmobile Intrigue GL 22,097 24,150 
Pontiac Grand Am GT 18,790 20,535 
Saturn SL 9,936 10,570 
Subaru Impreza L 14,695 15,995 
Toyota Corolla LE 12,042 13,383 
Toyota Camry LE 18,169 20,415 
Toyota Prius 18,793 19,995 
VW Jetta GLS 15,347 16,500 
VW Passat GLS 19,519 21,450 
Volvo S40 22,090 23,500 


Solution 


(a) The dotplot for the sticker price is shown in Figure 14.28. 
The following summary statistics are obtained by the describe command in Minitab. 


N MEAN MEDIAN TRMEAN STDEV SEMEAN 
St.price 25 17726 = 19035 17758 4278 856 


MIN MAX Q1 Q3 
St.price 10570 24150 13322 21350 
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W@ FIGURE 14.28 Dotplot for the sticker price. 


(b) The box plot for the sticker price is shown in Figure 14.29. 
According to this, there are no outliers. The normal plot is shown in Figure 14.30. 
This is approximately normal. 
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Wi FIGURE 14.29 Box plot for the sticker price. 


Normal plot for the sticker price 


Probability 
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Average: 17.7259 Anderson-Darling Normality Test 
Std Dev: 42.7781 A-squared: 0.721 
N of data: 2.5 p-value: 0.052 


Wi FIGURE 14.30 Normal plot for the sticker price. 


(c) The 95% confidence interval for the sticker price is 


N | MEAN | STDEV | SE MEAN | 95.0 PERCENT Cll. 
St.price | 25 | 17726 | 4278 856 (15960, 19492) 
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(d) Because there are no outliers and the data look approximately normal, parametric tests seems to be 
appropriate for these data. 

(e) The scatterplot for dealer cost versus sticker price is shown in Figure 14.31. 

(f) Figure 14.32 shows the fitted regression line. 


An analysis of residuals by Minitab gives Figure 14.33. 
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Wi FIGURE 14.31 Scatterplot for dealer cost versus sticker price. 


Dealer cost vs. Sticker price 


Y = —191.308 + 1.09984X 
R-Squared = 0.995 
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Wi FIGURE 14.32 Regression line for dealer cost versus sticker price. 
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Residual model diagnostics 


Normal plot of residuals 
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Histogram of residuals 
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Wi FIGURE 14.33 Residuals versus fit. 
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Fit 


By looking at the residuals versus fits, we can see that we have a good fit, and hence the model looks 


appropriate. 


EXERCISES 14.7 
14.7.1. 


Table 14.7.1 gives revenue (in thousands) for public elementary and secondary schools, by 


state, for 1997-1998 and corresponding pupils per teacher for that state for 20 randomly 
selected states (source: The World Almanac and Book of Facts 2000). 


(a) Obtain a dotplot and describe the pupils per teacher data. 
(b) Identify any outliers and test for normality with and without outliers for the pupils per 
teacher data. If the data are not normal, does any simple transformation make the data 


normal? 


(c) Obtain a 95% confidence interval for pupils per teacher. 
(d) Obtain a scatterplot between total revenue and pupils per teacher. 
(e) Fit a regression line between total revenue and pupils per teacher. 
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Table 14.7.1 

State Total revenue _—— Pupils per teacher 
Arizona 4,388,915 19.8 
Connecticut 5,112,950 14.2 
Alabama 4,030,356 16.3 
Indiana 7,006,752 17.2 
Kansas 3,090,829 14.9 
Oregon 3,119,028 20.1 
Nebraska 1,688,662 14.5 
New York 27,690,556 15.0 
Virginia 6,661,612 14.7 
Washington 6,722,916 20.2 
Illinois 13,649,628 16.8 
North Carolina 7,127,549 15.9 
Georgia 8,579,628 16.2 
Nevada 1,754,717 18.5 
Ohio 12,694,407 16.7 
New Hampshire 1,365,391 15.6 


14.7.2. Table 14.7.2 gives the dealer cost and sticker price for luxury cars and sports utility vehicles 
with popular options (source: Money Magazine, March 2001). 
(a) Obtain a dotplot and describe the sticker price data. 


(b) Identify any outliers and test for normality with and without outliers for sticker 
price data. If the data are not normal, does any simple transformation make the data 
normal? 


(c) Obtain a 95% confidence interval for sticker price. 

(d) Do parametric or nonparametric methods seem more appropriate for the data? 

(e) Obtain a scatterplot between dealer cost and sticker price. 

(f) Fit a least-squares regression line and run a residual model diagnostics using Minitab. 


14.7.3. For the college tuition data of Exercise 14.5.5, fit a least-squares regression line and run a 
residual model diagnostics using Minitab. 
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Table 14.7.2 
Model Dealer cost Sticker price 
(in dollars) (in dollars) 

Acura TL 3.2 26,218 29,030 
Audi A6 4.2 45,385 50,754 
BMW 525i 33,800 37,245 
Cadillac DeVille DHS4 43,825 47,603 
Infiniti 130 Touring 28,604 32,065 
Jaguar XJ8 52,535 58,171 
Lexus GS430 41,881 48,581 
Mercedes-Benz C320 35,067 36,950 
SAAB 9-3 Viggen 35,270 38,690 
Volvo S80T-6 39,315 41,768 
BMW X5 4.4i 45,994 50,774 
Chevrolet Blazer LT 26,958 29,725 
Dodge Durango 26,845 29,370 
GMC Jimmy SLE 26,637 29,370 
Honda CR-V LX 17,578 19,190 
Isuzu Trooper LS 27,901 31,285 
Jeep Cherokee SE 21,392 23,130 
Lexus LX470 54,785 63,474 
Mercedes-Benz ML430 42,243 45,337 
Nissan Pathfinder SE 27,203 29,869 
Pontiac Aztek GT 22,912 24,995 
Subaru Forester S 21,990 24,190 
Suzuki Vitara JS 16,063 17,079 
Toyota RAV4 18,786 20,630 
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14.7.4. The following data give the area (in square feet) and the sale prices (approximated to the 
nearest $1000) of homes that were sold in a particular city in a 6-week period of 2003. 


Area: 1123 1028 1490 2172 2300 1992 3200 3063 3720 
7228 720 943 904 912 1031 1152 1482 1426 
1491 1184 1650 1392 1755 2062 2495 3253 5152 
1270 1723 1161 1220 837 1446 2442 2300 2518 


Price: 75 75 102 149 152 154 327 425 625 
775 53 57 66 68 75 86 90 93 
95 95 104 105 135 159 169 253 725 
67 85 110 65 74 95 156 183 207 


(a) Obtain a dotplot and describe the home price data. 

(b) Identify any outliers and test for normality with and without outliers for home price 
data. If the data are not normal, does any simple transformation make the data normal? 

(c) Obtain a 95% confidence interval for home price. 

(d) Do parametric or nonparametric methods seem more appropriate for the data? 

(e) Obtain a scatterplot between the square-foot area of a home and its price. 

(f) Fit a least-squares regression line and run a residual model diagnostics using Minitab. 


14.8 CONCLUSION 


We have briefly discussed some of the problems that arise in applied data analysis. However, this 
discussion is not exhaustive. There are various other special problems that can arise in applied data 
analysis. For example, if one or both of the sample sizes are small, it may be hard to detect violations 
of some of the assumptions. For small samples, violation of assumptions such as inequalities of 
variances is hard to discover. Also, for small sample sizes, possible outliers whose detection may be in 
doubt may have undue influence on the inferences. It is better to avoid such problems in the design 
stage of an experiment, when suitable sample sizes can be determined before we start collecting data. 


Differences in distributional shapes can influence the testing procedures of two or more samples. In 
those cases, utilizing a transformation may settle that problem and may also promote normality as 
well as correct the problem of inequality of variances. There are also many issues related to simula- 
tion that are discussed in Chapter 13 in the utilization of empirical methods—for instance, in the 
application of MCMC methods, the issues of burn-in, choice of the correct proposal function, and 
convergence. These are beyond the scope of this book. 


Combining the issues discussed in this chapter with the rest of the material of this textbook should 
give the student a good footing in the theory of statistics as well as the ability to deal with many 
real-world problems. 


Set Theory 


In this appendix, we present some of the basic ideas and concepts of set theory that are essential 
for a modern introduction to probability and statistics. The origin of set theory is credited to Georg 
Cantor, when he proved the uncountability of the real line in 1873. A set is defined as a collection of 
well-defined distinct objects. These objects of a set are called elements or members. The elements of a 
set can be anything: the alphabet, numbers, people, other sets, and so forth. Sets are conventionally 
denoted with capital letters, A, B, C, and so on. A universal set, denoted by S, is the collection of all 
possible elements under consideration. If a is an element of a set A, we write a € A. If a is not an 
element of A, we write a ¢ A. 


Aset is described either by listing its elements or by stating the properties that characterize the elements 
of the set. For example, to specify the set A of all positive integers less than 12, we may write 


{1,2,3,4,5, 6, 7,8, 9, 10, 11} 
A= {all positive integers less than 12} 
{x:x < 12, aisa positive integer}. 


Sets are classified as finite or infinite. A set is finite if it contains exactly n objects, where n is a 
nonnegative integer. A set is infinite if it is not finite. For example, if A is a set containing all positive 
integers less than or equal to 50, then A is a finite set. If B is a set containing all the positive integers, 
it is an infinite set. 


Describing a set by stating its properties is the practical way to represent a set with a large or infinite 
number of elements. 


A set B is a subset of a set A if every element of B is also an element of A. We denote this by writing 
BCA, whichis read “A contains B” or “Bis contained in A.” For example, if A is the set of real numbers 
and 


B={x:x<5,x a positive integer}, 
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it is clear that B is a subset of A. Also, every subset is a subset of itself. Two sets A and B are 
equal, A= B, if and only if AC B and BCA. Thus, two sets A and B are said to be equal if 
they have the same members. A set B is a proper subset of a set A if every element of B is an 
element of A and A contains at least one element that is not an element of B. We denote this 
relationship by BC A. In the previous example, we have BC A. The set, which contains no ele- 


ments, is called the empty set (or null set) and is denoted by ¢. The null set ¢ is a subset of 
every set. 


A Venn diagram is used for visual representation of sets. In the Venn diagram, the universal set, S, is 
represented by a rectangle. The subsets are represented by circles inside this rectangle. 


W@ FIGURE Al.1 A Venn diagram. 


Al.1 SET OPERATIONS 


Union, U: The union of two sets A and B is the set of all elements that belong to A or B (or both; 
elements that belong to both sets are included only once) and is denoted by AU B. 


AUB={x:xeAorxe Bh. 


AUB 


Wi FIGURE Al.2 Union of two sets. 


Intersection, N: The intersection of two sets A and B is the set of all elements that belong to both 
A and B and is denoted by ANB. AN B= {xe S:xeAandxe B}. 


If AN B = @, then the sets A and B are said to be disjoint or mutually exclusive sets. 
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ANB 


W@ FIGURE Al.3 Intersection of two sets. 


Complement: The complement of a set A is the set of all elements that belong to S but not to A. 
AS = {x:x€ S;x € A}. 


AS 


Wi FIGURE Al.4 Complement of a set. 


The difference of any two sets, A and B, denoted by A\ B, is equal to AM B°. Thus, Af = S\A. It should 
be noted that (A‘°)° = A. The symmetric difference between any two sets, A and B, denoted by AAB, is 
the set of elements in A or B, but not both, that is, (A\B) U (B\A). 


PROPERTIES OF SETS 
If A, B, and C are the subsets of the universal set S, then they satisfy the following properties. 
Commutative law 


AUB=BUA 
ANB=BNA 


Associative law 


AU(BUC) =(AUB)UC =AUBUC 
AN(BNC) =(ANB)NC 


750 APPENDIX I Set Theory 


Distributive law 
AU(BNC) =(AUB)N(AUC), 
AN (BUC) =(ANB)U(ANC) 
Idempotent law 
MUASA ANA=A 
Identity law 
MIS =S, AMS =z 
AU®G=A, ANGB=@ 
Complement law 
AUAS =S, ANAS =@ 
De Morgan’s laws 
(AUB)S = AS NBS 
(ANB)S = AS UBS 


The two sets A and B are said to be in one-to-one correspondence (denoted by 1:1) ifeach elementa € A 
is paired with one and only one element ) € B in such a manner that each element of B is paired 
with exactly one element of A. For example, if A = {a), a2, a3, ag} and B = {1, 2,3, 4}, then A and B 
have a 1:1 correspondence. 


A set whose elements can be put into a one-to-one correspondence with the set of all positive integers 
is referred to as being a countably infinite set. Also, a set is said to be countable, denumerable, or enumerable 
if it is finite or countably infinite. The product or Cartesian product of sets A and B is denoted by 
A x Band consists of all ordered pairs (a, b), where a € A and b € B, that is, 


Ax B={(a,b):aeA,be B}. 
For example, if A = {a,, a2, a3} and B = {1, 2}, then 
Ax B= {(a1, 1), (41, 2), (a2, 1), (@, 2), (a3, 1), Gs, 2)}. 


The notion of a Cartesian product can be extended to any finite number of sets; that is, 
A, x Az x ... X Ap is the set of all ordered n-tuples, (a1, a2, ..., d,), where 


ay €A1,a0€ A?,:-: »4n € An. 


Appendix 


Review of Markov Chains 


A stochastic or random process is defined as a family of random variables, {X (t)}, describing an empirical 
process, the development of which in time is governed by probabilistic laws. The state space, S, of 
the stochastic process is the set of all possible values that the random variable X(t) can take. The 
parameter f is often interpreted as time and may be either discrete or continuous. When the set of 
possible values of t forms a countable set, the process {X(t), t= 0,1, 2, ...}, is discrete. If t forms an 
interval of real line, the process {X(t), t => 0} is said to be continuous. In the discrete case, the state 
space can be finite or infinite. 


Among many different discrete stochastic processes, we are interested in a special class called Markov 
chains. The basic concepts of Markov chains were introduced in 1907 by the Russian mathematician 
A. A. Markov. 


Let i;, iz, ... represent the states of the chain. The sequence of random variables X,, X2,... is called 
a Markov chain if 


POG = te, |Ai Sty Re SH 4) SH Pe SH, |e i), 


An intuitive interpretation is that a stochastic process { X (t)} has the Markov property if the conditional 
probability of any future state, given the present and past states, is independent of the past states and 
depends only on the present state. Thus, a Markov chain can be used to model the position of an 
object in a discrete set of possible states over time, in which the subsequent position is chosen at 
random from a distribution that depends only on the current location of the chain and not on any 
previous locations of the chain. 


The conditional probabilities that the chain moves to state j at time n, given that it is in state i at time 
n — 1, are called transition probabilities and are denoted by pjj, 


py = P Xn = 7 |2n-1 =1), 
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with the subscript ij of p indicating the direction of transition i > j. Sometimes, p;; may also 
be represented by p(i, j), and if we need to represent the time points, then we use the notation, 
Pn—inGi, J) = P(Xn = j|Xn-1 = 3). 


Two basic assumptions we make are that (i) pj; > 0 for all i and j; the transition probabilities are 
nonnegative. Also, (ii) for every i, 


n 


CO 
a Pij = 1 a ij = 1 if the state space is finite ] , 
j=l j=l 


that is, the chain makes a transition to some state in the state space. 


If the transition probabilities p;; depend only on the states i and j and not on the time n, then the 
conditional probabilities are called stationary. Markov chains with stationary probabilities are called 
(time) homogeneous Markov chains. We shall consider only homogeneous Markov chains. 


The behavior of homogeneous Markov chains is described by the transition or stochastic matrices of 
the processes where the transition probabilities are arranged as elements of a matrix. The transition 
or stochastic matrix of a chain having transition probabilities i, j = 1, 2,...n is 


Pu +++ Pin 
Pni *** Pnn 


In the infinite state space case, we represent the transition matrix in the following manner: 


Pllo-++) Pins: 


Pm Pmn* 


Each element of the matrix is nonnegative, and each row sums to 1. If we look at any particular row, 
say the mth row, then we can see the probabilities of going from state m to the various other states 
including the state m. 


eEe-"?._,aKhKhKjhJaJ_eaONreaRaoOCOCee——<~_xX vO 
Example All.1 
Four quarterbacks are warming up by throwing a football to one another. Let 1, 2, 3, and 4 denote the four 
quarterbacks. It has been observed that 1 is as likely to throw the ball to 2 as to 3 and 4. Player 2 never 
throws to 3 but splits his throws between 1 and 4. Quarterback 3 throws twice as many passes to 1 as to 4 
and never to 2, but 4 throws only to 1. This process forms a Markov chain because the player who is about 
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to throw the ball is not influenced by the player who had the ball before him. The one-step transition 
matrix is 


0 1/3 1/3 1/3 

ye @ oY ae 

2/3 0 oO 1/3 
1 0 0 0 


Following is a standard example of a chain with infinite state space. 


Ee SS ee 
Example All.2 
Consider a chain with state space S = (0, 1, 2, 3, ...) and transition matrix 


where pj, gj, 7; = 0 for alli=>0, pp + ro=1, and p; + qj +7; =1 for all i> 1. Thus, for this Markov chain, 
the transition probabilities are: p99 = ro, po. = po, and for i, j 4 0, 


Pi, fuitl 
ri, j=i 
Pi = —— 
Y Vai, j=i-l 


0, otherwise. 


This chain is known as the random walk chain (with barrier at 0). 
= 
The following example gives a transition matrix for the random walk chain in a special case. We can 
think of this as a chain resulting from tossing of a fair coin. If we are not at state zero, then if heads 


comes up, we take a step to the right and if tails comes up, we take a step to the left. If at state 0, we 
remain at zero for a tails outcome and move a step to the right for heads. 


ee 
Example All.3 
Consider a Markov chain with state space S = (0, 1, 2, 3, .. .) and the transition probabilities given by 


1/2, j=i-1 
Poo=1/2, piy= 41/2, joitl 
0, otherwise. 
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This results in the symmetric transition matrix with elements 


1/2 1/2 0 0 0 
1/2 0 1/2 0 0 
0 1/2 0 1/2 0 
A=|0 0 1/72 0 1/2 


The n-step transition probability, he is defined as the probability that the chain is in state i and will go to 


state j in n steps. If p;; is the one-step transition probability, Ze can be obtained as follows. Let i be the 


state of the process at time, m, that is Xm = i. Then, the n-step transition probability is 


This can be rewritten in the matrix notation as 


parm) — pl) p™ _ pl) pir). 


This is known as the Chapman-Kolmogorov equation. 


The following example shows how to compute an n-step transition matrix. 


8 


Example All.4 
Consider the one-step transition matrix given in Example 1, 


0 1/3 1/3 1/3 
1/2 0 O 1/2 
2/3 0 oO 1/3 

i © 68 oO 
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The two-step transition matrix, P2, is 


0 1/3 1/3 1/3\ (0 1/3 1/3 1/3 
1/2 0 O 1/2]]1/2 0 oO 1/2 
2/3 0 oO 1/3}]/2/3 0 oO 1/3 

1 0 0 0 1 0 0 0 


13/18 0 O 5/18 
1/2 1/6 1/6 1/6 
1/3 2/9 2/9 2/9 

0 1/3 1/3 1/3 


The three-step transition matrix, P3, is 


5/18 13/54 13/54 13/54 
13/36 1/6 1/6 11/36 
13/27. 1/9 1/9 8/27 
13/18 0 0 5/18 


Per r= 


For instance, the third row of P?, 
(13/27. 1/9 1/9 8/27), 


denotes that, after three throws, the ball is in the hands of players 1, 2, 3, and 4, with respective probabilities 
13/27, 1/9, 1/9, and 8/27. 
[r= 


A transition matrix, P, all entries of which are positive, is called a positive transition matrix. A state j of 
a Markov chain is accessible from a state i if oO > 0 forsomen => 0. If state j is accessible from state i, 
and state i is accessible from state j, the states are said to communicate. If all the states communicate, 
then the Markov chain is called irreducible. A state i is periodic (of period d) if the only way to revisit 
it is through steps of length k.d for some value of k and a fixed value of d > 1. Thus, the period, d, 
is the greatest common divisor of the number of steps n needed for the chain, starting at state i, to 
revisit the state i: 


d= GCD {n = 1p;, > 0}. 


If a state is not periodic, then it is called aperiodic. A state i is recurrent if it will be revisited by the 
chain with probability 1. That is, 


P(Xn = i for infinitely many n |Xq = i) =1. 


If a state is not recurrent, it is called transient. Recurrent, aperiodic states are called ergodic. It is 
necessary to impose an extra condition for ergodicity, that the expected recurrence time be finite. 
This is satisfied for recurrent states in a finite-state Markov chain. A Markov chain is called ergodic if 
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every state is ergodic. It is clear that a finite state Markov chain with a positive transition matrix is 
ergodic. 


The following result is of fundamental importance. 


Theorem AII.1 For an ergodic Markov chain, limy-s 50 ro = 1; exists, and this limit is independent of the 
initial state i. Let the vector x with elements (;) be the limiting or the stationary distribution of the chain. 
Then, this stationary probability vector is the unique solution of the equation 


u=nP 


and satisfies the normalization condition }) <5 1j = 1. 


If, at any transition step n, the distribution of the chain is the same as a obtained in Theorem AII.1, 
we say that the chain has reached the steady state. Thus, the vector x would be the unique steady-state 
probability vector of the Markov chain. 


Analogous to the law of large numbers for a sequence of independent random variables, for Markov 
chains we can obtain the following so-called ergodic theorem. 


Theorem AII.2 For any ergodic Markov chain {X,,} with stationary distribution x: 
1 n 
a » F(X) > » f@ mj = Ef(X) w.p.1. 
n 
k=1 ieS 


The validity of the Markov chain Monte Carlo method lies in this ergodic theorem. 


Appendix 


Common Probability Distributions 


In this appendix, we present some common probability distributions that are useful in 
statistical methods that we have used in this book. There is a much greater variety of distribu- 
tions that are very important in a particular area of applications. A good reference can be found 
at http://www.causascientia.org/math_stat/Dists/Compendium.pdf. We give the density function, 
mean, variance, and moment-generating function (mgf). For some distribution functions, if the 
mef is complicated, we just leave it out and refer the reader to one of the references in the book. 


Name pfd Mean Variance mef 
DP, x=1 
Bernoulli (x,p)=%41l—p, x=0 + pe’, 
a aces ious . ; P pQ — p) : ee 
distribution 0, otherwise. q=1-p 
O0O<p<l 
famp=("\pa 
Binomial _ x ; np npq (q+ pe’) 
x=0,1,...,n 
f(x, p) =q! p,x=1,2,... T t 
Geometric — a = ; 
O0<p<l Pp P 1—qe 
n—-xX m m 
Hyper- . f(x, N,m,n) = —T¥\ Whi n(=) (1 ~) (N —n) 
geometric te N N-1 
N=0,1 = 0,1, 
n=0, ae a 
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Name pfd Mean Variance mef 
Negative f(x%.n p) = (: = ‘) pg r 
: i x q q P 
binomial r— rs —— 
P P 1-—qe' 
= 10, W2s..35 
Me 
; F(%, 4) = —_, 
Poisson x Xr Xr exp(A(e’ — 1)) 
=O 7,2 ose 
T'(a+ B) 
(x, a, B) = (| x@1(y — x PI, 
Bet ‘ Par (A) a id 
a+B (a+ p)(@+ B+1) 
O<x<l1 
2/25 v1 g-x?/2 
(x, v) = ————__—_-,, Tr 1)/2 
Chi-square f P(x/2) V2 ee v—p 
T(v/2) 
x > 0, v > O(degrees of freedom) 
he ** x>0 
f(x, A) = | , ae 1 1 -1 
Exponential 0, otherwise, Be ean (1 = -) 
x A2 Xr 
X>0 
ct ,— B* —a 
e t 
fags 2s, ” , (:- 5) . 
Gamma P@) 2 BR B 
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Appendix 


Probability Tables 


p= 


Table AIV.1 Cumulative Binomial Probabilities, P(X < x) = bea p(i) 


0.01 


0.05 


0.10 


0.20 


0.30 


0.40 


0.50 


0.60 


0.70 


0.80 


0.90 


0.95 


0.99 


0.980 
1.000 
1.000 


& 
ll 


0.903 
0.998 
1.000 


0.810 
0.990 
1.000 


0.640 
0.960 
1.000 


0.490 
0.910 
1.000 


0.360 
0.840 
1.000 


0.250 
0.750 
1,000 


0.160 
0.640 
1.000 


0.090 
0.510 
1.000 


0.040 
0.360 
1.000 


0.010 
0.190 
1,000 


0.003 
0.098 
1,000 


0.000 
0.020 
1,000 


x OS 
ll 


0.970 
1.000 
1.000 
1.000 


0.857 
0.993 
1.000 
1.000 


0.729 
0.972 
0.999 
1.000 


0.512 
0.896 
0.992 
1.000 


0.343 
0.784 
0.973 
1.000 


0.216 
0.648 
0.936 
1.000 


0.125 
0.500 
0.875 
1,000 


0.064 
0.352 
0.784 
1.000 


0.027 
0.216 
0.657 
1.000 


0.008 
0.104 
0.488 
1.000 


0.001 
0.028 
0.271 
1.000 


0.000 
0.007 
0.143 
1,000 


0.000 
0.000 
0.030 
1.000 


x oS 
ll 


0.961 
0.999 
1.000 
1.000 
1.000 


0.815 
0.986 
1.000 
1.000 
1.000 


0.656 
0.948 
0.996 
1.000 
1.000 


0.410 
0.819 
0.973 
0.998 
1.000 


0.240 
0.652 
0.916 
0.992 
1.000 


0.130 
0.475 
0.821 
0.974 
1.000 


0.063 
0.313 
0.688 
0.938 
1,000 


0.026 
0.179 
0.525 
0.870 
1.000 


0.008 
0.084 
0.348 
0.760 
1.000 


0.002 
0.027 
0.181 
0.590 
1.000 


0.000 
0.004 
0.052 
0.344 
1,000 


0.000 
0.000 
0.014 
0.185 
1.000 


0.000 
0.000 
0.001 
0.039 
1,000 


x oS 
ll 


kWNY CUS ARAWN | CO FI WN |B CO WIN = OC 


0.951 
0.999 
1.000 
1.000 
1.000 


0.774 
0.977 
0.999 
1.000 
1.000 


0.590 
0.919 
0.991 
1.000 
1.000 


0.328 
0.737 
0.942 
0.993 
1.000 


0.168 
0.528 
0.837 
0.969 
0.998 


0.078 
0.337 
0.683 
0.913 
0.990 


0.031 
0.188 
0.500 
0.813 
0.969 


0.010 
0.087 
0.317 
0.663 
0.922 


0.002 
0.031 
0.163 
0.472 
0.832 


0.000 
0.007 
0.058 
0.263 
0.672 


0.000 
0.000 
0.009 
0.081 
0.410 


0.000 
0.000 
0.001 
0.023 
0.226 


0.000 
0.000 
0.000 
0.001 
0.049 


(continued) 
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Table AIV.1 (continued) 


p= 


0.01 0.05 0.10 0.20 0.30 040 0.50 0.60 0.70 0.80 0.90 0.95 0.99 


ex Ss 
ll 
ONAURWNRBOLUINAUAWNBZ TO AIAUAWNABZON/U BWDN BOD 


0.941 0.735 0.531 0.262 0.118 0.047 0.016 0.004 0.001 0.000 0.000 0.000 0.000 
0.999 0.967 0.886 0.655 0.420 0.233 0.109 0.041 0.011 0.002 0.000 0.000 0.000 
1.000 0.998 0.984 0.901 0.744 0.544 0.344 0.179 0.070 0.017 0.001 0.000 0.000 
1.000 1.000 0.999 0.983 0.930 0.821 0.656 0.456 0.256 0.099 0.016 0.002 0.000 
1.000 1.000 1.000 0.998 0.989 0.959 0.891 0.767 0.580 0.345 0.114 0.033 0.001 
1.000 1.000 1.000 1.000 0.999 0.996 0.984 0.953 0.882 0.738 0.469 0.265 0.059 


0.932 0.698 0.478 0.210 0.082 0.028 0.008 0.002 0.000 0.000 0.000 0.000 0.000 
0.998 0.956 0.850 0.577 0.329 0.159 0.063 0.019 0.004 0.000 0.000 0.000 0.000 
1.000 0.996 0.974 0.852 0.647 0.420 0.227 0.096 0.029 0.005 0.000 0.000 0.000 
1.000 1.000 0.997 0.967 0.874 0.710 0.500 0.290 0.126 0.033 0.003 0.000 0.000 
1.000 1.000 1.000 0.995 0.971 0.904 0.773 0.580 0.353 0.148 0.026 0.004 0.000 
1.000 1.000 1.000 1.000 0.996 0.981 0.938 0.841 0.671 0.423 0.150 0.044 0.002 
1.000 1.000 1.000 1.000 1.000 0.998 0.992 0.972 0.918 0.790 0.522 0.302 0.068 


0.923 0.663 0.430 0.168 0.058 0.017 0.004 0.001 0.000 0.000 0.000 0.000 0.000 
0.997 0.943 0.813 0.503 0.255 0.106 0.035 0.009 0.001 0.000 0.000 0.000 0.000 
1.000 0.994 0.962 0.797 0.552 0.315 0.145 0.050 0.011 0.001 0.000 0.000 0.000 
1.000 1.000 0.995 0.944 0.806 0.594 0.363 0.174 0.058 0.010 0.000 0.000 0.000 
1.000 1.000 1.000 0.990 0.942 0.826 0.637 0.406 0.194 0.056 0.005 0.000 0.000 
1.000 1.000 1.000 0.999 0.989 0.950 0.855 0.685 0.448 0.203 0.038 0.006 0.000 
1.000 1.000 1.000 1.000 0.999 0.991 0.965 0.894 0.745 0.497 0.187 0.057 0.003 
1.000 1.000 1.000 1.000 1.000 0.999 0.996 0.983 0.942 0.832 0.570 0.337 0.077 


0.914 0.630 0.387 0.134 0.040 0.010 0.002 0.000 0.000 0.000 0.000 0.000 0.000 
0.997 0.929 0.775 0.436 0.196 0.071 0.020 0.004 0.000 0.000 0.000 0.000 0.000 
1.000 0.992 0.947 0.738 0.463 0.232 0.090 0.025 0.004 0.000 0.000 0.000 0.000 
1.000 0.999 0.992 0.914 0.730 0.483 0.254 0.099 0.025 0.003 0.000 0.000 0.000 
1.000 1.000 0.999 0.980 0.901 0.733 0.500 0.267 0.099 0.020 0.001 0.000 0.000 
1.000 1.000 1.000 0.997 0.975 0.901 0.746 0.517 0.270 0.086 0.008 0.001 0.000 
1.000 1.000 1.000 1.000 0.996 0.975 0.910 0.768 0.537 0.262 0.053 0.008 0.000 
1.000 1.000 1.000 1.000 1.000 0.996 0.980 0.929 0.804 0.564 0.225 0.071 0.003 
1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.990 0.960 0.866 0.613 0.370 0.086 
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Table AIV.1 (continued) 


p = 

0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 0.99 
n= 10 

x=0/ 0.904 0.599 0.349 0.107 0.028 0.006 0.001 0.000 0.000 0.000 0.000 0.000 0.000 

1 | 0.996 0.914 0.736 0.376 0.149 0.046 0.011 0.002 0.000 0.000 0.000 0.000 0.000 

2) 1.000 0.988 0.930 0.678 0.383 0.167 0.055 0.012 0.002 0.000 0.000 0.000 0.000 

3 1.000 0.999 0.987 0.879 0.650 0.382 0.172 0.055 0.011 0.001 0.000 0.000 0.000 

4/ 1.000 1.000 0.998 0.967 0.850 0.633 0.377 0.166 0.047 0.006 0.000 0.000 0.000 

5) 1.000 1.000 1.000 0.994 0.953 0.834 0.623 0.367 0.150 0.033 0.002 0.000 0.000 

6) 1.000 1.000 1.000 0.999 0.989 0.945 0.828 0.618 0.350 0.121 0.013 0.001 0.000 

7 |} 1.000 1.000 1.000 1.000 0.998 0.988 0.945 0.833 0.617 0.322 0.070 0.012 0.000 

8 1.000 1.000 1.000 1.000 1.000 0.998 0.989 0.954 0.851 0.624 0.264 0.086 0.004 

9) 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.994 0.972 0.893 0.651 0.401 0.096 
n=15 

x=0/ 0.860 0.463 0.206 0.035 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

1 | 0.990 0.829 0.549 0.167 0.035 0.005 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

2) 1.000 0.964 0.816 0.398 0.127 0.027 0.004 0.000 0.000 0.000 0.000 0.000 0.000 

3 1.000 0.995 0.944 0.648 0.297 0.091 0.018 0.002 0.000 0.000 0.000 0.000 0.000 

4) 1.000 0.999 0.987 0.836 0.515 0.217 0.059 0.009 0.001 0.000 0.000 0.000 0.000 

5) 1.000 1.000 0.998 0.939 0.722 0.403 0.151 0.034 0.004 0.000 0.000 0.000 0.000 

6) 1.000 1.000 1.000 0.982 0.869 0.610 0.304 0.095 0.015 0.001 0.000 0.000 0.000 

7 |} 1.000 1.000 1.000 0.996 0.950 0.787 0.500 0.213 0.050 0.004 0.000 0.000 0.000 

8 1.000 1.000 1.000 0.999 0.985 0.905 0.696 0.390 0.131 0.018 0.000 0.000 0.000 

9) 1.000 1.000 1.000 1.000 0.996 0.966 0.849 0.597 0.278 0.061 0.002 0.000 0.000 

10 | 1.000 1.000 1.000 1.000 0.999 0.991 0.941 0.783 0.485 0.164 0.013 0.001 0.000 

11 | 1.000 1.000 1.000 1.000 1.000 0.998 0.982 0.909 0.703 0.352 0.056 0.005 0.000 

12 | 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.973 0.873 0.602 0.184 0.036 0.000 

13 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.965 0.833 0.451 0.171 0.010 

14 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.995 0.965 0.794 0.537 0.140 
n= 20 

x=0/ 0.818 0.358 0.122 0.012 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

1 | 0.983 0.736 0.392 0.069 0.008 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

2) 0.999 0.925 0.677 0.206 0.035 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

3 1.000 0.984 0.867 0.411 0.107 0.016 0.001 0.000 0.000 0.000 0.000 0.000 0.000 
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Table AIV.1 (continued) 


p= 


0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 0.99 


4)| 1.000 0.997 0.957 0.630 0.238 0.051 0.006 0.000 0.000 0.000 0.000 0.000 0.000 
5 | 1.000 1.000 0.989 0.804 0.416 0.126 0.021 0.002 0.000 0.000 0.000 0.000 0.000 
6 | 1.000 1.000 0.998 0.913 0608 0.250 0.058 0.006 0.000 0.000 0.000 0.000 0.000 
7 | 1.000 1.000 1.000 0.968 0.772 0.416 0.132 0.021 0.001 0.000 0.000 0.000 0.000 
8 | 1.000 1.000 1.000 0.990 0.887 0.596 0.252 0.057 0.005 0.000 0.000 0.000 0.000 

9 | 1.000 1.000 1.000 0.997 0.952 0.755 0412 0.128 0.017 0.001 0.000 0.000 0.000 
10 | 1.000 1.000 1.000 0.999 0.983 0.872 0.588 0.245 0.048 0.003 0.000 0.000 0.000 
11 | 1.000 1.000 1.000 1.000 0.995 0.943 0.748 0.404 0.113 0.010 0.000 0.000 0.000 
12) 1.000 1.000 1.000 1.000 0.999 0.979 0.868 0.584 0.228 0.032 0.000 0.000 0.000 
13 | 1.000 1.000 1.000 1.000 1.000 0.994 0.942 0.750 0.392 0.087 0.002 0.000 0.000 
14 | 1.000 1.000 1.000 1.000 1.000 0.998 0.979 0.874 0.584 0.196 0.011 0.000 0.000 
15 | 1.000 1.000 1.000 1.000 1.000 1.000 0.994 0.949 0.762 0.370 0.043 0.003 0.000 
16 | 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.984 0.893 0.589 0.133 0.016 0.000 
17 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.965 0.794 0.323 0.075 0.001 
18 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.992 0.931 0.608 0.264 0.017 
19 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.999 0.988 0.878 0.642 0.182 


x=0 | 0.778 0.277 0.072 0.004 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
1 | 0.974 0.642 0.271 0.027 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 

2 | 0.998 0.873 0.537 0.098 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
3 | 1.000 0.966 0.764 0.234 0.033 0.002 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
4) 1.000 0.993 0.902 0.421 0.090 0.009 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
5 | 1.000 0.999 0.967 0.617 0.193 0.029 0.002 0.000 0.000 0.000 0.000 0.000 0.000 
6 | 1.000 1.000 0.991 0.780 0.341 0.074 0.007 0.000 0.000 0.000 0.000 0.000 0.000 
7 | 1.000 1.000 0.998 0.891 0.512 0.154 0.022 0.001 0.000 0.000 0.000 0.000 0.000 
8 | 1.000 1.000 1.000 0.953 0.677 0.274 0.054 0.004 0.000 0.000 0.000 0.000 0.000 
9} 1.000 1.000 1.000 0.983 0.811 0425 0.115 0.013 0.000 0.000 0.000 0.000 0.000 
10 1.000 1.000 1.000 0.994 0.902 0.586 0.212 0.034 0.002 0.000 0.000 0.000 0.000 
11 | 1.000 1.000 1.000 0.998 0.956 0.732 0.345 0.078 0.006 0.000 0.000 0.000 0.000 
12) 1.000 1.000 1.000 1.000 0.983 0.846 0.500 0.154 0.017 0.000 0.000 0.000 0.000 
13 | 1.000 1.000 1.000 1.000 0.994 0.922 0.655 0.268 0.044 0.002 0.000 0.000 0.000 
14 | 1.000 1.000 1.000 1.000 0.998 0.966 0.788 0.414 0.098 0.006 0.000 0.000 0.000 
15 | 1.000 1.000 1.000 1.000 1.000 0.987 0.885 0.575 0.189 0.017 0.000 0.000 0.000 
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Table AIV.1 (continued) 


p= 


0.01 0.05 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 0.95 0.99 


16 | 1.000 1.000 1.000 1.000 1.000 0.996 0.946 0.726 0.323 0.047 0.000 0.000 0.000 
17 | 1.000 1.000 1.000 1.000 1.000 0.999 0.978 0.846 0488 0.109 0.002 0.000 0.000 
18 | 1.000 1.000 1.000 1.000 1.000 1.000 0.993 0.926 0.659 0.220 0.009 0.000 0.000 
19 | 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.971 0.807 0.383 0.033 0.001 0.000 
20 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.991 0.910 0.579 0.098 0.007 0.000 
21 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.967 0.766 0.236 0.034 0.000 
22 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.991 0.902 0.463 0.127 0.002 
23 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.998 0.973 0.729 0.358 0.026 
24 | 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.996 0.928 0.723 0.222 
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Area between 0 and z 


Table AIV.2 Standard Norms Table 


Zz 


z | 0.00 


0.01 


0.02 


0.03 


0.04 


0.05 


0.06 


0.07 


0.08 


0.09 


0.0 | 0.0000 
0.1 | 0.0398 
0.2 | 0.0793 
0.3 | 0.1179 
0.4 | 0.1554 


0.0040 
0.0438 
0.0832 
0.1217 
0.1591 


0.0080 
0.0478 
0.0871 
0.1255 
0.1628 


0.0120 
0.0517 
0.0910 
0.1293 
0.1664 


0.0160 
0.0557 
0.0948 
0.1331 
0.1700 


0.0199 
0.0596 
0.0987 
0.1368 
0.1736 


0.0239 
0.0636 
0.1026 
0.1406 
0.1772 


0.0279 
0.0675 
0.1064 
0.1443 
0.1808 


0.0319 
0.0714 
0.1103 
0.1480 
0.1844 


0.0359 
0.0753 
0.1141 
0.1517 
0.1879 


0.5 | 0.1915 
0.6 | 0.2257 
0.7 | 0.2580 
0.8 | 0.2881 
0.9 | 0.3159 


0.1950 
0.2291 
0.2611 
0.2910 
0.3186 


0.1985 
0.2324 
0.2642 
0.2939 
0.3212 


0.2019 
0.2357 
0.2673 
0.2967 
0.3238 


0.2054 
0.2389 
0.2704 
0.2995 
0.3264 


0.2088 
0.2422 
0.2734 
0.3023 
0.3289 


0.2123 
0.2454 
0.2764 
0.3051 
0.3315 


0.2157 
0.2486 
0.2794 
0.3078 
0.3340 


0.2190 
0.2517 
0.2823 
0.3106 
0.3365 


0.2224 
0.2549 
0.2852 
0.3133 
0.3389 


1.0 | 0.3413 
1.1 | 0.3643 
1.2 | 0.3849 
1.3 | 0.4032 
1.4 | 0.4192 


0.3438 
0.3665 
0.3869 
0.4049 
0.4207 


0.3461 
0.3686 
0.3888 
0.4066 
0.4222 


0.3485 
0.3708 
0.3907 
0.4082 
0.4236 


0.3508 
0.3729 
0.3925 
0.4099 
0.4251 


0.3531 
0.3749 
0.3944 
0.4115 
0.4265 


0.3554 
0.3770 
0.3962 
0.4131 
0.4279 


0.3577 
0.3790 
0.3980 
0.4147 
0.4292 


0.3599 
0.3810 
0.3997 
0.4162 
0.4306 


0.3621 
0.3830 
0.4015 
0.4177 
0.4319 


1.5 | 0.4332 
1.6 | 0.4452 
1.7 | 0.4554 
1.8 | 0.4641 
1.9 | 0.4713 


0.4345 
0.4463 
0.4564 
0.4649 
0.4719 


0.4357 
0.4474 
0.4573 
0.4656 
0.4726 


0.4370 
0.4484 
0.4582 
0.4664 
0.4732 


0.4382 
0.4495 
0.4591 
0.4671 
0.4738 


0.4394 
0.4505 
0.4599 
0.4678 
0.4744 


0.4406 
0.4515 
0.4608 
0.4686 
0.4750 


0.4418 
0.4525 
0.4616 
0.4693 
0.4756 


0.4429 
0.4535 
0.4625 
0.4699 
0.4761 


0.4441 
0.4545 
0.4633 
0.4706 
0.4767 


2.0 | 0.4772 
2.1 | 0.4821 
2.2 | 0.4861 
2.3 | 0.4893 
2.4 | 0.4918 


0.4778 
0.4826 
0.4864 
0.4896 
0.4920 


0.4783 
0.4830 
0.4868 
0.4898 
0.4922 


0.4788 
0.4834 
0.4871 
0.4901 
0.4925 


0.4793 
0.4838 
0.4875 
0.4904 
0.4927 


0.4798 
0.4842 
0.4878 
0.4906 
0.4929 


0.4803 
0.4846 
0.4881 
0.4909 
0.4931 


0.4808 
0.4850 
0.4884 
0.4911 
0.4932 


0.4812 
0.4854 
0.4887 
0.4913 
0.4934 


0.4817 
0.4857 
0.4890 
0.4916 
0.4936 


2.5 | 0.4938 
2.6 | 0.4953 
2.7 | 0.4965 
2.8 | 0.4974 
2.9 | 0.4981 


0.4940 
0.4955 
0.4966 
0.4975 
0.4982 


0.4941 
0.4956 
0.4967 
0.4976 
0.4982 


0.4943 
0.4957 
0.4968 
0.4977 
0.4983 


0.4945 
0.4959 
0.4969 
0.4977 
0.4984 


0.4946 
0.4960 
0.4970 
0.4978 
0.4984 


0.4948 
0.4961 
0.4971 
0.4979 
0.4985 


0.4949 
0.4962 
0.4972 
0.4979 
0.4985 


0.4951 
0.4963 
0.4973 
0.4980 
0.4986 


0.4952 
0.4964 
0.4974 
0.4981 
0.4986 


3.0 | 0.4987 
3.1 | 0.4990 
3.2 | 0.4993 
3.3 | 0.4995 
3.4 | 0.4997 


0.4987 
0.4991 
0.4993 
0.4995 
0.4997 


0.4987 
0.4991 
0.4994 
0.4995 
0.4997 


0.4988 
0.4991 
0.4994 
0.4996 
0.4997 


0.4988 
0.4992 
0.4994 
0.4996 
0.4997 


0.4989 
0.4992 
0.4994 
0.4996 
0.4997 


0.4989 
0.4992 
0.4994 
0.4996 
0.4997 


0.4989 
0.4992 
0.4995 
0.4996 
0.4997 


0.4990 
0.4993 
0.4995 
0.4996 
0.4997 


0.4990 
0.4993 
0.4995 
0.4997 
0.4998 
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Table AIV.3 1-Table 


Right Tail Probabilities 


Up.at) 


0.40 0.25 0.10 0.05 0.025 0.01 0.005 0.0005 


0.324920 1.000000 3.077684 6.313752 12.70620 31.82052 63.65674 636.6192 
0.288675 0.816497 1.885618 2.919986 4.30265 6.96456 9.92484 31.5991 
0.276671 0.764892 1.637744 2.353363 3.18245 4.54070 5.84091 12.9240 
0.270722 0.740697 1.533206 2.131847 2.77645 3.74695 4.60409 8.6103 
0.267181 0.726687 1.475884 2.015048 2.57058 3.36493 4.03214 6.8688 
0.264835 0.717558 1.439756 1.943180 2.44691 3.14267 3.70743 5.9588 
0.263167 0.711142 1.414924 1.894579 2.36462 2.99795 3.49948 5.4079 
0.261921 0.706387 1.396815 1.859548 2.30600 2.89646 3.35539 5.0413 
0.260955 0.702722 1.383029 1.833113 2.26216 2.82144 3.24984 4.7809 
0.260185 0.699812 1.372184 1.812461 2.22814 2.76377 3.16927 4.5869 
0.259556 0.697445 1.363430 1.795885 2.20099 2.71808 3.10581 4.4370 
0.259033 0.695483 1.356217 1.782288 2.17881 2.68100 3.05454 4.3178 
0.258591 0.693829 1.350171 1.770933 2.16037 2.65031 3.01228 4.2208 
0.258213 0.692417 1.345030 1.761310 2.14479 2.62449 2.97684 4.1405 
0.257885 0.691197 1.340606 1.753050 2.13145 2.60248 2.94671 4.0728 
0.257599 0.690132 1.336757 1.745884 2.11991 2.58349 2.92078 4.0150 
0.257347 0.689195 1.333379 1.739607 2.10982 2.56693 2.89823 3.9651 
0.257123 0.688364 1.330391 1.734064 2.10092 2.55238 2.87844 3.9216 
0.256923 0.687621 1.327728 1.729133 2.09302 2.53948 2.86093 3.8834 


ee ere © 3 
See en OO OS cee 


20 | 0.256743 0.686954 1.325341 1.724718 2.08596 2.52798 2.84534 3.8495 
21 0.256580 0.686352 1.323188 1.720743 2.07961 2.51765 2.83136 3.8193 
22 | 0.256432 0.685805 1.321237 1.717144 2.07387 2.50832 2.81876 3.7921 
23 | 0.256297 0.685306 1.319460 1.713872 2.06866 2.49987 2.80734 3.7676 
24 | 0.256173 0.684850 1.317836 1.710882 2.06390 2.49216 2.79694 3.7454 
25 | 0.256060 0.684430 1.316345 1.708141 2.05954 2.48511 2.78744 3.7251 
26 | 0.255955 0.684043 1.314972 1.705618 2.05553 2.47863 2.77871 3.7066 


27 =| 0.255858 0.683685 1.313703 1.703288 2.05183 2.47266 2.77068 3.6896 
28 | 0.255768 0.683353 1.312527 1.701131 2.04841 2.46714 2.76326 3.6739 
29 | 0.255684 0.683044 1.311434 1.699127 2.04523 2.46202 2.75639 3.6594 
30 | 0.255605 0.682756 1.310415 1.697261 2.04227 2.45726 2.75000 3.6460 
oo =| 0.253347 0.674490 1.281552 1.644854 1.95996 2.32635 2.57583 3.2905 


766 APPENDIX IV Probability Tables 


Table AIV.4 Chi-Square Probabilities 


x2 


df\p | 0.995 0.99 0.975 0.95 0.90 0.10 0.05 0.025 0.01 0.005 


1 4x107-> 16x107> 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 


2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 
6 0.676 0.872 1.237. 1.635 2.204 10.645 12.592 14.449 16.812 18.548 
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 


13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 


15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34,267 
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 


23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44314 46.928 
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 
50 27.991 29.707 = 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 
100 67.328 70.065 74.222 77.929 82.358 118498 124.342 129.561 135.807 140.169 
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Table AIV.5 Percentage Point of F-Distributions 
Qa 
0 F. 
Numerator Cf. 
Denominator 
df. a I 2 3 4 5 6 7 8 9 

i] .100 39.86 49.50 53.59 55.83 57.24 58.20 58.91 59.44 59.86 
.050 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5 
025 647.8 799.5 864.2 899.6 921.8 937.1 948.2 956.7 963.3 
010 4052 4999.5 5403 5625 5764 5859 5928 5982 6022 
005 16211 20000 21615 22500 23056 23437 23715 23925 24091 

2 .100 8.53 9.00 9.16 9.24 9.29 9.33 9.35 9.37 9.38 
.050 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19,38 
025 38.51 39.00 39.17 39.25 39,30 39,33 39,36 39.37 39.39 
.010 98.50 99.00 99.17 99.25 99.30 99.33 99.36 99.37 99.39 
005 198.5 199.0 199.2 199.2 199.3 199.3 199.4 199.4 199.4 

3 .100 5.54 5.46 5.39 5,34 5.31 $.28 5.27 5.25 5.24 
050 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81 
025 17.44 16.04 15.44 15.10 14.88 14.73 14.62 14.54 14.47 
.010 34.12 30.82 29.46 28.71 28.24 27.91 27.67 27.49 27.35 
005 55.55 49.80 47.47 46.19 45,39 44.84 44,43 44.13 43.88 

4 .100 4.54 4,32 4.19 4.11 4.05 4.01 3.98 3.95 3.94 
.050 771 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 
025 12.22 10.65 9.98 9.60 9.36 9.20 9.07 8.98 8.90 
.010 21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 
005 31.33 26.28 24.26 23.15 22.46 21.97 21.62 21.35 21.14 

5 .100 4.06 3.78 3.62 3.52 3.45 3.40 3.37 3.34 3.32 
.050 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77 
025 10.01 8.43 7.76 7.39 TAS 6.98 6.85 6.76 6.68 
010 16.26 13.27 12.06 11.39 10.97 10.67 10.46 10.29 10.16 
005 22.78 18.31 16.53 15.56 14.94 14.51 14.20 13.96 13.77 

6 .100 3.78 3.46 3.29 3.18 3.11 3.05 3.01 2.98 2.96 
.050 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4,10 
O25 8.81 7.26 6.60 6.23 5.99 5.82 5.70 5.60 5.52 
.010 13.75 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 
.005 18.63 14,54 12.92 12.03 11.46 11.07 10.79 10.57 10.39 

7 100 3.59 3.26 3.07 2.96 2.88 2.83 2.78 2.75 2.72 
.050 5.59 4.74 4,35 4.12 3.97 3.87 3.79 3.73 3.68 
025 8.07 6.54 5.89 5:52 5.29 $42 4.99 4,90 4.82 
010 12.25 9.55 8.45 7.85 746 719 6.99 6.84 6.72 
005 16.24 12.40 10.88 10.05 9.52 9.16 8.89 8.68 8.51 
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Table AIV.5 (continued) 
F 


a 


Numerator Cf. 


10 12 Is 20 24 30 40 60 120 oe) a |df 


60.19 60.71 61.22 61.74 62.00 62.26 62.53 62.79 63.06 63.33 .100| 1 
241.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 253.3 254.3 050 
968.6 976.7 984.9 993.1 997.2 1001 1006 1010 1014 1018 025 

6056 6106 6157 6209 6235 6261 6287 6313 6339 6366 =. .010 
24224 24426 24630 24836 24940 25044 25148 25253 25359 = 25465005 


9.39 9.41 9.42 9.44 9.45 9.46 9.47 9.47 9.48 9.49 100] 2 
19.40 19.41 19.43 19.45 19.45 19.46 19.47 19.48 19.49 19.50 .050 
39.40 39.41 39.43 39.45 39.46 39.46 39.47 39,48 39.49 39.50 .025 
99.40 99.42 99.43 99.45 99.46 99.47 99.47 99.48 99.49 99.50 O10 
199.4 199.4 199.4 199.4 199.5 199.5 199.5 199.5 199.5 199.5 .005 


5.23 5.22 5.20 5.18 5.18 SLT 5.16 5.15 5.14 5.13 .100} 3 
8.79 8.74 8.70 8.66 8.64 8.62 8.59 8.57 8.55 8.53 .050 
14.42 14.34 14.25 14.17 14.12 14.08 14.04 13.99 13.95 13.90 .025 
27.23 27.05 26.87 26.69 26.60 26.50 26.41 26,32 26.22 26.13 .010 
43.69 43.39 43.08 42.78 42.62 42.47 42.31 42.15 41.99 41.83 .005 


3.92 3.90 3.87 3.84 3.83 3.82 3.80 3.79 3.78 3.76 100] 4 
5.96 5.91 5.86 5.80 5.77 5.75 5.72 5.69 5.66 5.63 .050 
8.84 8.75 8.66 8.56 8.51 8.46 8.41 8.36 8.31 8.26 .025 
14.55 14.37 14.20 14.02 13.93 13.84 13.75 13.65 13.56 13.46 .010 
20.97 20.70 20.44 20.17 20.03 19.89 19.75 19.61 19.47 19.32 .005 


3.30 3.27 3.24 3.21 3.19 3.17 3.16 3.14 3.12 3.10 .100) 5 
4.74 4.68 4.62 4.56 4.53 4.50 4.46 4.43 4.40 4.36 .050 
6.62 6.52 6.43 6.33 6.28 6.23 6.18 6.12 6.07 6.02 .025 
10.05 9.89 9.72 9.55 9.47 9.38 9.29 9.20 9.11 9.02 .010 
13.62 13.38 13.15 12.90 12.78 12.66 12.53 12.40 12:27 12.14 .005 


2.94 2.90 2.87 2.84 2.82 2.80 2.78 2.76 2.74 2.72 .100| 6 
4.06 4.00 3.94 3.87 3.84 3.81 3.77 3.74 3.70 3.67 .050 
5.46 5.37 5.27 5.17 5.12 5.07 5.01 4.96 4.90 4.85 .025 
7.87 7.72 7.56 7.40 731 7.23 7.14 7.06 6.97 6.88 .010 
10.25 10.03 9.81 9.59 9.47 9.36 9.24 9.12 9.00 8.88 .005 


2.70 2.67 2.63 2.59 2.58 2.56 2.54 2.51 2.49 2.47 .100| 7 
3.64 3.57 3.51 3.44 3.41 3.38 3.34 3.30 3.27 3.23 .050 
4.76 4.67 457 4.47 442 4.36 4.31 4.25 4.20 4.14 .025 
6.62 6.47 6.31 6.16 6.07 5.99 5.91 5.82 5.74 5.65 .010 
8.38 8.18 797 7.75 7.65 7.53 7.42 731 7.19 7.08 .005 
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Table AIV.5 (continued) 
F 


a 


Numerator df. 
Denominator 
df a I 2 3 4 5 6 7 8 9 


8 .100 3.46 3.11 2.92 2.81 2.73 2.67 2.62 2.59 2.56 
050 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 
.025 7.57 6.06 5.42 5.05 4,82 4.65 4.53 4.43 4.36 
.010 11.26 8.65 7.59 7.01 6.63 6.37 6.18 6.03 5.91 
.005 14.69 11.04 9.60 8.81 8.30 7.95 7.69 7.50 7.34 


9 .100 3.36 3.01 2.81 2.69 2.61 2.55 2.51 2.47 2.44 
050 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 
025 7.21 5.71 5.08 4.72 4.48 4,32 4.20 4.10 4.03 
.010 10.56 8.02 6.99 6.42 6.06 5.80 5.61 5.47 3.35 
.005 13.61 10.11 8.72 7.96 747 7.13 6.88 6.69 6.54 


10 .100 3.29 2.92 2.73 2.61 2.52 2.46 2.41 2.38 2.35 
.050 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 
.025 6.94 5.46 4.83 4.47 4.24 4.07 3.95 3.85 3.78 
010 10.04 7.56 6.55 5.99 5.64 5.39 5.20 5.06 4.94 
.005 12.83 9.43 8.08 7.34 6.87 6.54 6.30 6.12 5.97 


11 .100 3.23 2.86 2.66 2.54 2.45 2.39 2.34 2.30 2.27 
.050 4.34 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 
.025 6.72 5.26 4.63 4.28 4.04 3.88 3.76 3.66 3.59 
.010 9.65 7.21 6.22 5.67 5.32 5.07 4.89 4.74 4.63 
.005 12.23 8.91 7.60 6.88 6.42 6.10 5.86 5.68 5.54 


12 .100 3.18 2.81 2.61 2.48 2.39 2.33 2.28 2.24 2.21 
.050 4.75 3.89 3.49 3.26 3.11 3.00 2.91 2.85 2.80 
025 6.55 5.10 4.47 4.12 3.89 3.73 3.61 3.51 3.44 
010 9.33 6.93 3.95 5.41 5.06 4.82 4.64 4.50 4.39 
.005 11.75 8.51 7.23 6.52 6.07 5.76 5.52 5.35 5.20 


13 .100 3.14 2.76 2.56 2.43 2.35 2.28 2.23 2.20 2.16 
.050 4.67 3.81 3.41 3.18 3.03 2.92 2.83 2.77 2.71 
025 6.41 4.97 4.35 4.00 3.77 3.60 3.48 3.39 3.31 
.010 9.07 6.70 5.74 5.21 4.86 4.62 4.44 4.30 4.19 
.005 11.37 8.19 6.93 6.23 5.79 5.48 5.25 5.08 4.94 


14 .100 3.10 2.73 2.52 2.39 2.31 2.24 2.19 2.15 2.12 
.050 4.60 3.74 3.34 3.11 2.96 2.85 2.76 2.70 2.65 
.025 6.30 4.86 4.24 3.89 3.66 3.50 3.38 3.29 3.21 
010 8.86 6.51 5.56 5.04 4.69 4.46 4.28 4.14 4.03 
.005 11.06 7.92 6.68 6.00 5.56 5.26 5.03 4.86 4,72 


(continued) 
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Table AIV.5 (continued) 
F 


a 


Numerator df. 


10 12 15 20 24 30 40 60 120 oO a df. 


2.54 2.50 2.46 2.42 2.40 2.38 2.36 2.34 2.32 2.29 .100 8 
3.35 3.28 3.22 3.15 3.12 3.08 3.04 3.01 2.97 2.93 .050 
4.30 4.20 4.10 4.00 3.95 3.89 3.84 3.78 3.73 3.67 .025 
5.81 5.67 5.52 5.36 5.28 5.20 5.12 5.03 4.95 4.86 O10 
7.21 7.01 6.81 6.61 6.50 6.40 6.29 6.18 6.06 5.95 .005 


2.42 2.38 2.34 2.30 2.28 2.25 2.23 2.21 2.18 2.16 .100 9 
3.14 3.07 3.01 2.94 2.90 2.86 2.83 2.79 2.75 2.71 .050 
3,96 3.87 3.77 3.67 3.61 3.56 3.51 3.45 3.39 3.33 .025 
5.26 5.14 4.96 4.81 4.73 4.65 4.57 4.48 4.40 4.31 .010 
6.42 6.23 6.03 5.83 5.73 5.62 5.52 5.41 5.30 5.19 .005 


2.32 2.28 2.24 2.20 2.18 2.16 2.13 241 2.08 2,06 .100 10 
2.98 2.91 2.85 2.74 2.77 2.70 2.66 2.62 2.58 2.54 .050 
3.72 3.62 3.52 3.42 3.37 3.31 3.26 3.20 3.14 3.08 .025 
4.85 4.71 4.56 4.41 4.33 4.25 4.17 4.08 4.00 3.91 .010 
5.85 5.66 5.47 327 5.17 5.07 4.97 4.86 4.75 4.64 .005 


225 2.21 2.17 2.12 2.10 2.08 2.05 2.03 2.00 1.97 .100 il 
2.85 2.79 2.72 2.65 2.61 2.57 2.53 2.49 2.45 2.40 .050 
3.53 3.43 3.33 3.23 3.17 3.12 3.06 3.00 2.94 2.88 .025 
4.54 4.40 4.25 4.10 4.02 3.94 3.86 3.78 3.69 3.60 .010 
5.42 5.24 5.05 4.86 4.76 4.65 4.55 4.44 4.34 4.23 .005 


2.19 2.15 2.10 2.06 2.04 2.01 1.99 1.96 1.93 1.90 .100 12 
2.75 2.69 2,62 2.54 2.51 2.47 2.43 2.38 2.34 2.30 .050 
3.37 3.28 3.18 3.07 3.02 2.96 2.91 2.85 2.79 2.72 025 
4.30 4.16 4.01 3.86 3.78 3.70 3.62 3.54 3.45 3.36 .010 
5.09 4.91 4.72 4.53 4.43 4.33 4.23 4.12 4.01 3.90 .005 


2.14 2.10 2.05 2.01 1.98 1.96 1.93 1.90 1.88 1.85 .100 13 
2.67 2.60 2.53 2.46 2.42 2.38 2.34 2.30 229 2.21 .050 
3.25 3.15 3.05 2.95 2.89 2.84 2.78 2.72 2.66 2.60 .025 
4.10 3.96 3.82 3.66 3.59 3.51 3.43 3.34 3.25 3.17 .010 
4.82 4.64 4.46 4.27 417 4.07 3.97 3.87 3.76 3.65 .005 


2.10 2.05 2.01 1.96 1.94 1.91 1.89 1.86 1.83 1.80 .100 14 
2.60 2.53 2.46 2.39 2.35 2.31 2.27 2.22 2.18 2.13 .050 
3.15 3.05 2.95 2.84 2.79 213 2.67 2.61 2.55 2.49 .025 
3.94 3.80 3.66 3.51 3.43 3.35 3.27 3.18 3.09 3.00 .010 
4.60 4.43 4.25 4.06 3.96 3.86 3.76 3.66 3.55 3.44 .005 
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Table AIV.5 (continued) 
B; 
Numerator Cf. 
Denominator 

df. a 1 2 5 4 5 6 7 8 9 

15 .100 3.07 2.70 2.49 2.36 2.27 2.21 2.16 2.12 2.09 
050 4.54 3.68 3.29 3.06 2.90 2.79 2.71 2.64 2.59 
025 6.20 477 4.15 3.80 3.58 3.41 3.29 3.20 3.12 
010 8.68 6.36 5.42 4.89 4,56 4.32 4.14 4.00 3.89 
005 10.80 7.10 6.48 5.80 5.37 5.07 4.85 4.67 4.54 

16 .100 3.05 2.67 2.46 2.33 2.24 2.18 2.13 2.09 2.06 
050 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 
025 6.12 4.69 4.08 3.73 3.50 3.34 3.22 3.12 3.05 
010 8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 
005 10.58 75k 6.30 5.64 5.21 491 4.69 4,52 4.38 

17 .100 3.03 2.64 2.44 2.31 2.22 2.15 2.10 2.06 2.03 
050 4.45 3.59 3.20 2.96 2.81 2.70 2.61 2.55 2.49 
025 6.04 4.62 4.01 3.66 3.44 3.28 3.16 3.06 2.98 
010 8.40 6.11 5.18 4.67 4,34 4.10 3.93 3.79 3.68 
005 10.38 7.35 6.16 5.50 5.07 4.78 4.56 4.39 4,25 

18 100 3.01 2.62 2.42 2.29 2.20 2.13 2.08 2.04 2.00 
050 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 
025 5.98 4.56 3.95 3.61 3.38 3.22 3.10 3.01 2.93 
010 8.29 6.01 5.09 4.58 4.25 4.01 3.84 3.71 3.60 
.005 10.22 7.21 6.03 5.37 4.96 4.66 4.44 4.28 4.14 

19 .100 2.99 2.61 2.40 2.27 2.18 2.11 2.06 2.02 1.98 
050 4.38 3.52 3.13 2.90 2.74 2.63 2.54 2.48 2.42 
025 5.92 4,51 3.90 3.56 3.33 3.17 3.05 2.96 2.88 
010 8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 
005 10.07 7.09 5.92 5.27 4.85 4.56 4.34 4.18 4.04 

20 .100 2.97 2.59 2.38 2.25 2.16 2.09 2.04 2.00 1.96 
050 4.35 3.49 3.10 2.87 2.71 2.60 2.51 2.45 2.39 
025 5.87 4.46 3.86 3.51 3.29 3.13 3.01 2.91 2.84 
.010 8.10 5.85 4.94 4.43 4.10 3.87 3.70 3.56 3.46 
005 9.94 6.99 5.82 5.17 4.76 447 4.26 4.09 3.96 

21 .100 2.96 2.57 2.36 2.23 2.14 2.08 2.02 1.98 1.95 
.050 4,32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 
025 5.83 4.42 3.82 3.48 3:25 3.09 2.97 2.87 2.80 
.010 8.02 5.78 4.87 4,37 4.04 3.81 3.64 3.51 3.40 
005 9.83 6.89 5.73 5.09 4.68 4.39 4.18 4.01 3.88 
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Table AIV.5 (continued) 
F 


a 


Numerator Cf. 


10 12 15 20 24 30 40 60 120 co a df. 


2.06 2.02 1.97 1.92 1.90 1.87 1.85 1.82 1.79 1.76 .100 15 
2.54 2.48 2.40 2.33 2229 2.25 2.20 2.16 2.11 2.07 .050 
3.06 2.96 2.86 2.76 2.70 2.64 2.59 2.52 2.46 2.40 .025 
3.80 3.67 3.52 3.37 3.29 3.21 3.13 3.05 2.96 2.87 .010 
4.42 4.25 4.07 3.88 3.79 3.69 3.58 3.48 3.37 3.26 .005 


2.03 1.99 1.94 1.89 1.87 1.84 1.81 1.78 1.75 1.72 .100 16 
2.49 2.42 20 2.28 2.24 2.19 2.15 211 2.06 2.01 .050 
2.99 2.89 2.79 2.68 2.63 2.57 2.51 2.45 2.38 2.32 .025 
3.69 3.55 3.41 3.26 3.18 3.10 3.02 2.93 2.84 2.75 010 
4.27 4.10 3.92 3.73 3.64 3.54 3.44 3.33 3.22 3.11 .005 


2.00 1.96 1.91 1.86 1.84 1.81 1.78 1.75 1.72 1.69 .100 17 
2.45 2.38 2.31 2.23 2.19 2.15 2.10 2.06 2.01 1.96 .050 
2.92 2.82 2.72 2.62 2.56 2.50 2.44 2.38 2.32 2.25 025 
3.59 3.46 3.31 3.16 3.08 3.00 2.92 2.83 2.75 2.65 .010 
4.14 3.97 3.79 3.61 3.51 3.41 3.31 3.21 3.10 2.98 .005 


1.98 1.93 1.89 1.84 1.81 1.78 1.75 1.72 1.69 1.66 .100 18 
241 2.34 2.27 2.19 2.15 2.11 2.06 2.02 1.97 1.92 .050 
2.87 2.77 2.67 2.56 2.50 2.44 2.38 2.32 2.26 2.19 .025 
3.51 3.37 3.23 3.08 3.00 2.92 2.84 2.75 2.66 2.57 .010 
4.03 3.86 3.68 3.50 3.40 3.30 3.20 3.10 2.99 2.87 .005 


1.96 1.91 1.86 1.81 1.79 1.76 1.73 1.70 1.67 1.63 .100 19 
2.38 2.31 2.23 2.16 2.11 2.07 2.03 1.98 1.93 1.88 .050 
2.82 2:12 2.62 2.51 2.45 2.39 2.33 2.27 2.20 2.13 .025 
3.43 3.30 3.15 3.00 2.92 2.84 2.76 2.67 2.58 2.49 .010 
3.93 3.76 3.59 3.40 3.31 3.21 3.11 3.00 2.89 2.78 .005 


1.94 1.89 1.84 1.79 1.77 1.74 1.71 1.68 1.64 1.61 .100 20 
2:39 2.28 2.20 2.12 2.08 2.04 1.99 1.95 1.90 1.84 .050 
2.77 2.68 2.57 2.46 2.41 2.35 2.29 2.22 2.16 2.09 .025 
3.37 3.23 3.09 2.94 2.86 2.78 2.69 2.61 2.52 2.42 .010 
3.85 3.68 3.50 3.32 3:22 3.12 3.02 2.92 2.81 2.69 .005 


1.92 1.87 1.83 1.78 1.75 1.72 1.69 1.66 1.62 1.59 .100 21 
2.32 2.25 2.18 2.10 2.05 2.01 1.96 1.92 1.87 1.81 .050 
2.73 2.64 2.53 2.42 2.37 2.31 2.25 2.18 2.11 2.04 .025 
3.31 3.17 3.03 2.88 2.80 2.72 2.64 2.55 2.46 2.36 .010 
3.77 3.60 3.43 3.24 3.15 3.05 295 2.84 2.73 2.61 .005 
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Table AIV.5 (continued) 
F 


a 


Numerator df. 


Denominator 
af. a I 2 3 4 J 6 7 & 9 
22 100 2.95 256 2.35 2.22 213 2.06 2.01 1.97 1.93 


23 100 294 2.55 234 221 211 2.05 1499 195 1.92 


005 963 673 558 495 454 426 405 388 3.75 
24 100 «293 2.54 233 2.19 210 204 1.98 194 1.91 


005 955 666 552 489 449 420 3.99 3.83 3.69 
25 100) 2.92) 2.53 2.32 218 2.09 202 197 193 1.89 


005 948 660 546 484 443 415 394 3.78 3.64 
26 1000 291 2.52 2.31 2.17 208 2.01 196 1.92 1.88 


005 941 654 541 479 438 410 389 3.73 3.60 
27 100 2.90 2.51 230 217 2.07 2.00 195 191 1.87 


010 7.68 549 460 411 3.78 3.56 3.39 3.26 3.15 
005 9.34 649 536 474 434 406 385 3.69 3.56 


28 100° 2.89 2.50 2.29 216 2.06 200 1.94 190 1.87 
050 420 334 295 2.71 256 245 236 2.29 2.24 
025 5.61 422 363 3.29 306 2.90 2.78 269 261 
O10 7.64 545 457 407 3.75 3.53 3.36 3.23 3.12 
005 9.28 644 532 470 430 402 381 3.65 3.52 


(continued) 
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Table AIV.5 (continued) 
F 


a 


Numerator df. 


10 12 15 20 24 30 40 60 120 oO 2 df. 


190 186 181 1.76 1.73 1.70 167 164 160 1.57 .100 | 22 
2.30 2.23 215 207 203 1.98 194 189 1.84 1.78 .050 
2.70 260 250 239 233 2.27 2.21 2.14 2.08 2.00 .025 
3.26 312 298 283 2.75 2.67 2.58 2.50 240 2.31 .010 
3.70 3.54 3.36 3.18 308 2.98 288 2.77 266 2.55 .005 


189 184 180 1.74 1.72 169 166 162 1.59 1.55 .100 | 23 
2.27 2.20 2.13 205 201 196 191 186 1.81 1.76 .050 
2.67 257 247 2.36 230 2.24 218 211 2.04 1.97 .025 
3.21 3.07 293 2.78 2.70 262 2.54 245 2.35 2.26 .010 
3.64 347 3.30 3.12 302 292 282 2.71 2.60 2.48 .005 


188 1.83 1.78 2.73 1.70 167 164 161 1.57 1.53 .100 | 24 
225 218 211 203 198 194 189 184 179 14.73 .050 
2.64 2.54 244 2.33 2.27 2.21 2.15 2.08 2.01 1.94 .025 
3.17 3.03 289 2.74 266 2.58 249 240 2.31 2.21 010 
3.59 342 3.25 306 2.97 287 2.77 266 2.55 2.43 .005 


1.87 182 4.77 £72 169 166 163 1.59 1.56 1.52 .100 | 25 
2.24 2.16 2.09 2.01 196 1.92 1.87 182 1.77 1.71 .050 
2.61 251 241 230 2.24 218 2.12 205 1.98 191 .025 
3.13 2.99 285 2.70 262 254 245 236 2.27 2.17 .010. 
3.54 337 3.20 301 2.92 282 2.72 261 250 2.38 .005 


1.86 1.81 1.76 71 168 165 161 158 154 1.50 .100 | 26 
2.22 215 207 199 195 190 185 180 1.75 1.69 .050 
2.59 249 2.39 2.28 2.22 216 2.09 203 1.95 1.88 .025 
3.09 296 281 266 258 250 242 2.33 2.23 2.13 010 
349 333 3.15 297 287 2.77 267 2.56 245 2.33 .005 


185 180 1.75 1.70 167 164 160 1.57 153 1.49 100 | 27 
2.20 213 206 197 193 188 184 179 1.73 1.67 .050 
2.57 247 2.36 225 219 213 207 2.00 1.93 1.85 .025 
3.06 293 2.78 263 255 247 2.38 2.29 2.20 2.10 .010 
345 3.28 3.141 293 283 2.73 263 252 241 2.29 .005 


1.84 179 1.74 169 166 163 159 156 1.52 148 .100 | 28 
2.19 212 204 196 191 187 182 177 171 1.65 .050 
2.55 245 234 2.23 217 211 205 1.98 191 1.83 .025 
3.03 290 2.75 260 252 244 235 2.26 2.17 2.06 .010 
341 3.25 3.07 289 2.79 269 2.59 248 2.37 2.25 .005 
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Table AIV.5 (continued) 


Denominator 
af. ol I 2 3 4 5 6 7 8 9 
29 100 289 250 2.28 2.15 206 199 193 189 1.86 


Oo0 AIS 332 293 270 250: 243 23s. 223 2.22 
025 5.59 420 361 3.27 304 2.88 2.76 267 2.59 
010 760 542 454 404 3.73 3.50 3.33 3.20 3.09 
005 923 640 5.28 4.66 426 398 3.77 3.61 3.48 


30 100 288 249 2.28 2.14 205 198 193 188 1.85 
050 417 332 292 2.69 253 242 233 227 221 
W025 357 MiS 359 325 303 287 273 263 257 
010 756 5.39 451 402 3.70 347 3.30 3.17 3.07 
005 9.18 635 524 462 423 395 3.74 358 3.45 


40 100 284 244 2.23 209 2.00 193 1.87 1.83 1.79 
050 408 3.23 284 261 245 234 225 218 212 
025 542 405 346 3.13 290 2.74 2.62 2.53 2.45 
010 731 3.18 431 383 351 3.29 3.12 299 2.89 
OOS 883 607 498 437 399 371i. 351 333 3.22 


60 100 2.79 2.39 218 204 195 187 182 1.77 1.74 
M50 400 345 2:76 253 237 225 247 210. 204 
023 329 3.93 3.34 301 279 263 251 241 2.33 
010 708 498 413 365 3.34 312 2.95 282 2.72 
005 849 5.79 473 414 3.76 349 3.29 3.13 3.01 


120 AUG 2.735 2335 243 199 190 1.82 L77  -Li2. 168 
O30 392 3.07 268 245 229 217 209 2.02 1.6 
2s AIS 380 325 289 267 252 239 230 222 
010 685 479 395 348 317 296 2.79 266 2.56 
005 818 5.54 450 3.92 355 3.28 3.09 2.93 281 


oe) 100 2.71 2.30 208 1.94 185 177 1.72 167 = 1.63 
050 3.84 3.00 260 237 221 210 201 1.94 1.88 
O25 3.02 3.69 312 279 257 241 229 219 2.11 
010 663 461 3.78 332 302 280 264 251 241 
005 7.88 5.30 428 3.72 3.35 3.09 290 2.74 2.62 


(continued) 
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Table AIV.5 (continued) 
F 


a 


Numerator df. 


10 12 15 20 24 30 40 60 120 oO «4 df. 


183 1.78 1.73 168 165 162 158 155 1.51 1.47 .100 29 
2.18 2.10 203 194 190 185 181 1.75 1.70 1.64 .050 
2.53 243 232 2.21 2.15 2.09 203 196 189 1.81 .025 
3.00 2.87 2.73 2.57 249 241 2.33 2.23 2.14 2.03 .010 
3.38 3.21 304 286 2.76 2.66 2.56 245 2.33 2.21 .005 


1820 477 1.72 167 164 161 157 154 150 1.46 100 30 
2.16 2.009 2.01 193 189 184 179 174 168 1.62 .050 
251 241 231 2.20 214 207 201 194 187 1.79 025 
298 284 2.70 255 247 239 230 221 2.21 201 O10 
3.34 3.18 3.0L 282 2.73 263 252 242 230 2.18 .005 


1.76 71 166 161 157 154 151 147 142 = 1.38 .100 40 
2.08 2.00 192 184 179 174 169 164 1.58 1.51 .050 
2.39 2.29 2.18 2.07 201 194 188 180 1.72 1.64 025 
2.80 2.66 2.52 2.37 2.29 220 211 202 192 1.0 010 
3.12 2.95 2.78 260 250 240 230 218 2.06 1.93 .005 


1.71 166 160 154 151 148 144 140 135 1.29 .100 60 
199 192 184 1.75 1.70 165 159 153 147 = 1.39 .050 
2.27 217 2.06 194 188 182 174 167 1.58 148 .025 
2.63 2.50 2.35 2.20 2.12 203 194 184 1.73 1.60 .010 
2.90 2.74 257 239 229 219 208 196 1.83 1.69 .005 


165 160 155 148 145 141 137 132 1.26 1.19 .100 | 120 
191 183 175 166 161 155 150 143 1.35 = 1.25 .050 
2.16 2.05 194 182 1.76 169 161 153 143 1.31 .025 
247 2.34 219 203 1.95 186 1.76 166 1.53 1.38 .010 
2.71 2.54 2.370 219 209 198 187 1.75 161 1.43 .005 


160 155 149 142 138 134 130 124 1.17 ~~ 1.00 .100 co 
1.830 1.75 167 157 152 146 1.39 1.32 1.22 1.00 .050 
2.05 194 183 1.71 164 157 148 139 1.27 = 1.00 .025 
2.32 2.18 204 188 1.79 1.70 159 147 132 1.00 .010 
252 2.36 219 200 14190 LO 167 42453 136 1.00 .005 
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Table AIV.6 Wilcoxon Signed Rank Test P(W* < c) 
n 
c 3 4 5 6 7 8 9 10 11 
0 0.125 0.062 0.031 0.016 0.008 0.004 0.002 0.001 0.000 
1 0.250 0.125 0.062 0.031 0.016 0.008 0.004 0.002 0.001 
2 0.375 0.188 0.094 0.047 0.023 0.012 0.006 0.003 0.001 
3 0.625 0.312 0.156 0.078 0.039 0.020 0.01 0.005 0.002 
4 0.750 0.438 0.219 0.109 0.055 0.027 0.014 0.007 0.003 
5 0.875 0.562 0.312 0.156 0.078 0.039 0.020 0.01 0.005 
6 1,000 0.688 0.406 0.219 0.109 0.055 0.027 0.014 0.007 
7 0.812 0.500 0.281 0.148 0.074 0.037 0.019 0.009 
8 0.875 0.594 0.344 0.188 0.098 0.049 0.024 0.012 
9 0.938 0.688 0.422 0.234 0.125 0.064 0.032 0.016 
10 1.000 0.781 0.500 0.289 0.156 0.082 0.042 0.021 
1 0.844 0.578 0.344 0.191 0.102 0.053 0.027 
12 0.906 0.656 0.406 0.230 0.125 0.065 0.034 
13 0.938 0.719 0.469 0.273 0.150 0.080 0.042 
14 0.969 0.781 0.531 0.320 0.180 0.097 0.051 
15 1.000 0.844 0.594 0.371 0.213 0.116 0.062 
16 0.891 0.656 0.422 0.248 0.138 0.074 
17 0.922 0.711 0.473 0.285 0.161 0.087 
18 0.953 0.766 0.527 0.326 0.188 0.103 
19 0.969 0.812 0.578 0.367 0.216 0.120 
20 0.984 0.852 0.629 0.410 0.246 0.139 
21 1.000 0.891 0.680 0.455 0.278 0.160 
22 0.922 0.727 0.500 0.312 0.183 
23 0.945 0.770 0.545 0.348 0.207 
24 0.961 0.809 0.590 0.385 0.232 
25) 0.977 0.844 0.633 0.423 0.260 
26 0.984 0.875 0.674 0.461 0.289 
27 0.992 0.902 0.715 0.500 0.319 
28 1.000 0.926 0.752 0.539 0.350 
29 0.945 0.787 0.577 0.382 
30 0.961 0.820 0.615 0.416 
31 0.973 0.850 0.652 0.449 
32 0.980 0.875 0.688 0.483 
33 0.988 0.898 0.722 0.517 
34 0.992 0.918 0.754 0.551 
35 0.996 0.936 0.784 0.584 
36 1.000 0.951 0.812 0.618 
37 0.963 0.839 0.650 
38 0.973 0.862 0.681 
39 0.980 0.884 0.711 
40 0.986 0.903 0.740 


(continued) 
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Table AIV.6 (continued) 
n 
c 3 4 5 6 7 8 9 10 11 

41 0.990 0.920 0.768 
42 0.994 0.935 0.793 
43 0.996 0.947 0.817 
44 0.998 0.958 0.840 
45 1.000 0.968 0.861 
46 0.976 0.880 
47 0.981 0.897 
48 0.986 0.913 
49 0.990 0.926 
50 0.993 0.938 
51 0.995 0.949 
52 0.997 0.958 
53 0.998 0.966 
54 0.999 0.973 
55 1.000 0.979 
56 0.984 
57 0.988 
58 0.991 
59 0.993 
60 
61 0.997 
62 0.998 
63 0.999 
64 0.999 
65 1.000 

1.000 
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Table AIV.6 Wilcoxon Signed Rank Test: P(Wt < c) (continued) 
n 
c 12 13 14 15 16 17 18 19 20 
) 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
1 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
2 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
3 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
4 0.002 0.001 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
5 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 0.000 
6 0.003 0.002 0.001 0.000 0.000 0.000 0.000 0.000 0.000 
7 0.005 0.002 0.001 0.001 0.000 0.000 0.000 0.000 0.000 
8 0.006 0.003 0.002 0.001 0.000 0.000 0.000 0.000 0.000 
9 0.008 0.004 0.002 0.001 0.001 0.000 0.000 0.000 0.000 
10 0.010 0.005 0.003 0.001 0.001 0.000 0.000 0.000 0.000 
11 0.013 0.007 0.003 0.002 0.001 0.000 0.000 0.000 0.000 
12 0.017 0.009 0.004 0.002 0.001 0.001 0.000 0.000 0.000 
13 0.021 0.011 0.005 0.003 0.001 0.001 0.000 0.000 0.000 
14 0.026 0.013 0.007 0.003 0.002 0.001 0.000 0.000 0.000 
15 0.032 0.016 0.008 0.004 0.002 0.001 0.001 0.000 0.000 
16 0.039 0.020 0.010 0.005 0.003 0.001 0.001 0.000 0.000 
17 0.046 0.024 0.012 0.006 0.003 0.002 0.001 0.000 0.000 
18 0.055 0.029 0.015 0.008 0.004 0.002 0.001 0.000 0.000 
19 0.065 0.034 0.018 0.009 0.005 0.002 0.001 0.001 0.000 
20 0.076 0.040 0.021 0.011 0.005 0.003 0.001 0.001 0.000 
21 0.088 0.047 0.025 0.013 0.007 0.003 0.002 0.001 0.000 
22 0.102 0.055 0.029 0.015 0.008 0.004 0.002 0.001 0.001 
23 0.117 0.064 0.034 0.018 0.009 0.005 0.002 0.001 0.001 
24 0.133 0.073 0.039 0.021 0.011 0.005 0.003 0.001 0.001 
225) 0.151 0.084 0.045 0.024 0.012 0.006 0.003 0.002 0.001 
26 0.170 0.095 0.052 0.028 0.014 0.007 0.004 0.002 0.001 
27 0.190 0.108 0.059 0.032 0.017 0.009 0.004 0.002 0.001 
28 0.212 0.122 0.068 0.036 0.019 0.010 0.005 0.003 0.001 
29 0.235 0.137 0.077 0.042 0.022 0.012 0.006 0.003 0.002 
30 0.259 0.153 0.086 0.047 0.025 0.013 0.007 0.004 0.002 
31 0.285 0.170 0.097 0.053 0.029 0.015 0.008 0.004 0.002 
32 0.311 0.188 0.108 0.060 0.033 0.017 0.009 0.005 0.002 
33 0.339 0.207 0.121 0.068 0.037 0.020 0.010 0.005 0.003 
34 0.367 0.227 0.134 0.076 0.042 0.022 0.012 0.006 0.003 
35 0.396 0.249 0.148 0.084 0.047 0.025 0.013 0.007 0.004 
36 0.425 0.271 0.163 0.094 0.052 0.028 0.015 0.008 0.004 
37 0.455 0.294 0.179 0.104 0.058 0.032 0.017 0.009 0.005 
38 0.485 0.318 0.195 0.115 0.065 0.036 0.019 0.010 0.005 
39 0.515 0.342 0.213 0.126 0.072 0.040 0.022 0.011 0.006 
40 0.545 0.368 0.232 0.138 0.080 0.044 0.024 0.013 0.007 
41 0.575 0.393 0.251 0.151 0.088 0.049 0.027 0.014 0.008 
42 0.604 0.420 0.271 0.165 0.096 0.054 0.030 0.016 0.009 
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Table AIV.6 (continued) 
n 
c 12 13 14 15 16 17 18 19 20 

43 0.633 0.446 0.292 0.180 0.106 0.060 0.033 0.018 0.01 
44 0.661 0.473 0.313 0.195 0.116 0.066 0.037 0.020 0.011 
45 0.689 0.500 0.335 0.211 0.126 0.073 0.041 0.022 0.012 
46 0.715 0.527 0.357 0.227 0.137 0.080 0.045 0.025 0.013 
47 0.741 0.554 0.380 0.244 0.149 0.087 0.049 0.027 0.015 
48 0.765 0.580 0.404 0.262 0.161 0.095 0.054 0.030 0.016 
49 0.788 0.607 0.428 0.281 0.174 0.103 0.059 0.033 0.018 
50 0.810 0.632 0.452 0.300 0.188 O12 0.065 0.036 0.020 
51 0.830 0.658 0.476 0.319 0.202 0.122 0.071 0.040 0.022 
52 0.849 0.682 0.500 0.339 0.217 0.132 0.077 0.044 0.024 
53 0.867 0.706 0.524 0.360 0.232 0.142 0.084 0.048 0.027 
54 0.883 0.729 0.548 0.381 0.248 0.153 0.091 0.052 0.029 
55 0.898 0.751 Q572 0.402 0.264 0.164 0.098 0.057 0.032 
56 0.912 0.773 0.596 0.423 0.281 0.176 0.106 0.062 0.035 
57 0.924 0.793 0.620 0.445 0.298 0.189 0.114 0.067 0.038 
58 0.935 0.812 0.643 0.467 0.316 0.202 0.123 0.072 0.041 
59 0.945 0.830 0.665 0.489 0.334 0.215 0.132 0.078 0.045 
60 0.954 0.847 0.687 0.511 0.353 0.229 0.142 0.084 0.049 
61 0.961 0.863 0.708 0.533 0.372 0.244 0.152 0.091 0.053 
62 0.968 0.878 0.729 0.555 0.391 0.259 0.162 0.098 0.057 
63 0.974 0.892 0.749 0.577 0.410 0.274 0.173 0.105 0.062 
64 0.979 0.905 0.768 0.598 0.430 0.290 0.185 0.113 0.066 
65 0.983 0.916 0.787 0.619 0.450 0.306 0.196 0.121 0.071 
66 0.987 0.927 0.805 0.640 0.470 0.322 0.209 0.129 0.077 
67 0.990 0.936 0.821 0.661 0.490 0.339 0.221 0.138 0.082 
68 0.992 0.945 0.837 0.681 0.510 0.356 0.234 0.147 0.088 
69 0.994 0.953 0.852 0.700 0.530 0.373 0.248 0.156 0.095 
70 0.995 0.960 0.866 0.719 0.550 0.391 0.261 0.166 0.101 
71 0.997 0.966 0.879 0.738 0.570 0.409 0.275 0.176 0.108 
72 0.998 0.971 0.892 0.756 0.590 0.427 0.290 0.187 0.115 
73 0.998 0.976 0.903 0.773 0.609 0.445 0.305 0.198 0.123 
74 0.999 0.980 0.914 0.789 0.628 0.463 0.320 0.209 0.131 
75 0.999 0.984 0.923 0.805 0.647 0.482 0.335 0.221 0.139 
76 1.000 0.987 0.932 0.820 0.666 0.500 0.351 0.233 0.147 
77 1.000 0.989 0.941 0.835 0.684 0.518 0.367 0.245 0.156 
78 1.000 0.991 0.948 0.849 0.702 0.537 0.383 0.258 0.165 
79 0.993 0.955 0.862 0.719 0.555 0.399 0.271 0.174 
80 0.995 0.961 0.874 0.736 0573 0.416 0.284 0.184 
81 0.996 0.966 0.885 0.752 0.591 0.433 0.297 0.194 
82 0.997 0.971 0.896 0.768 0.609 0.449 0.311 0.205 
83 0.998 0.975 0.906 0.783 0.627 0.466 0.325 0.215 
84 0.998 0.979 0.916 0.798 0.644 0.483 0.340 0.226 
85 0.999 0.982 0.924 0.812 0.661 0.500 0.354 0.237 
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Table AIV.6 (continued) 
n 
c 12 13 14 15 16 17 18 19 20 
86 0.999 0.985 0.932 0.826 0.678 0.517 0.369 0.249 
87 0.999 0.988 0.940 0.839 0.694 0.534 0.384 0.261 
88 1.000 0.990 0.947 0.851 0.710 0.551 0.399 0.273 
89 1.000 0.992 0.953 0.863 0.726 0.567 0.414 0.285 
90 1.000 0.993 0.958 0.874 0.741 0.584 0.430 0.298 
91 1.000 0.995 0.964 0.884 0.756 0.601 0.445 0.311 
92 0.996 0.968 0.894 0.771 0.617 0.461 0.324 
93 0.997 0.972 0.904 0.785 0.633 0.476 0.337 
94 0.997 0.976 0.912 0.798 0.649 0.492 0.351 
95 0.998 0.979 0.920 0.811 0.665 0.508 0.364 
96 0.998 0.982 0.928 0.824 0.680 0.524 0.378 
97 0.999 0.985 0.935 0.836 0.695 0.539 0.392 
98 0.999 0.987 0.942 0.847 0.710 0.555 0.406 
99 0.999 0.989 0.948 0.858 0.725 0.570 0.420 
100 1.000 0.991 0.953 0.868 0.739 0.586 0.435 
101 1.000 0.992 0.958 0.878 0.752 0.601 0.449 
102 1.000 0.994 0.963 0.888 0.766 0.616 0.464 
103 1.000 0.995 0.967 0.897 0.779 0.631 0.478 
104 1.000 0.996 0.971 0.905 0.791 0.646 0.493 
105 1.000 0.997 0.975 0.913 0.804 0.660 0.507 
106 0.997 0.978 0.920 0.815 0.675 0.522 
107 0.998 0.981 0.927 0.827 0.689 0.536 
108 0.998 0.983 0.934 0.838 0.703 0.551 
109 0.999 0.986 0.940 0.848 0.716 0.565 
110 0.999 0.988 0.946 0.858 0.729 0.580 
111 0.999 0.989 0.951 0.868 0.742 0.594 
112 0.999 0.991 0.956 0.877 0.755 0.608 
113 1.000 0.992 0.960 0.886 0.767 0.622 
114 1.000 0.993 0.964 0.894 0.779 0.636 
115 1.000 0.995 0.968 0.902 0.791 0.649 
116 1.000 0.995 0.972 0.909 0.802 0.663 
117 1.000 0.996 0.975 0.916 0.813 0.676 
118 1.000 0.997 0.978 0.923 0.824 0.689 
119 1.000 0.997 0.980 0.929 0.834 0.702 
120 1.000 0.998 0.983 0.935 0.844 0.715 
121 0.998 0.985 0.941 0.853 0.727 
122 0.999 0.987 0.946 0.862 0.739 
123 0.999 0.988 0.951 0.871 0.751 
124 0.999 0.990 0.955 0.879 0.763 
(125 0.999 0.991 0.959 0.887 0.774 
126 0.999 0.993 0.963 0.895 0.785 
127 1.000 0.994 0.967 0.902 0.795 
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Table AIV.6 (continued) 
n 
c 12 13 14 15 16 17 18 19 20 

128 1.000 0.995 0.970 0.909 0.806 
129 1.000 0.995 0.973 0.916 0.816 
130 1.000 0.996 0.976 0.922 0.826 
131 1.000 0.997 0.978 0.928 0.835 
132 1.000 0.997 0.981 0.933 0.844 
133 1.000 0.998 0.983 0.938 0.853 
134 1.000 0.998 0.985 0.943 0.861 
135 1.000 0.998 0.987 0.948 0.869 
136 1.000 0.999 0.988 0.952 0.877 
137 0.999 0.990 0.956 0.885 
138 0.999 0.991 0.960 0.892 
139 0.999 0.992 0.964 0.899 
140 0.999 0.993 0.967 0.905 
141 1.000 0.994 0.970 0.912 
142 1.000 0.995 0.973 0.918 
143 1.000 0.996 0.975 0.923 
144 1.000 0.996 0.978 0.929 
145 1.000 0.997 0.980 0.934 
146 1.000 0.997 0.982 0.938 
147 1.000 0.998 0.984 0.943 
148 1.000 0.998 0.986 0.947 
149 1.000 0.998 0.987 0.951 
150 1.000 0.999 0.989 0.955 
151 1.000 0.999 0.990 0.959 
152 1.000 0.999 0.991 0.962 
153 1.000 0.999 0.992 0.965 
154 0.999 0.993 0.968 
155 0.999 0.994 0.971 
156 1.000 0.995 0.973 
157 1.000 0.995 0.976 
158 1.000 0.996 0.978 
159 1.000 0.996 0.980 
160 1.000 0.997 0.982 
161 1.000 0.997 0.984 
162 1.000 0.998 0.985 
163 1.000 0.998 0.987 
164 1.000 0.998 0.988 
165 1.000 0.999 0.989 
166 1.000 0.999 0.990 
167 1.000 0.999 0.991 
168 1.000 0.999 0.992 
169 1.000 0.999 0.993 
170 1.000 0.999 0.994 
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Table AIV.6 (continued) 
n 
c 12 13 14 15 16 17 18 19 20 

171 1.000 1.000 0.995 
172 1.000 0.995 
173 1.000 0.996 
174 1.000 0.996 
175 1.000 0.997 
176 1.000 0.997 
177 1.000 0.998 
178 1.000 0.998 
179 1.000 0.998 
180 1.000 0.998 
181 1.000 0.999 
182 1.000 0.999 
183 1.000 0.999 
184 1.000 0.999 
185 1.000 0.999 
186 1.000 0.999 
187 1.000 0.999 
188 1.000 1.000 
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Table AIV.7 Wilcoxon Rank Sum Test 
P(W = a) 
Ay: fy = p, 
nm =n, n,y=m, wheren =m 
m 


1 
W=) Ry - zm(m + 1) 


m 
a 3 4 5 6 7 8 9 10 
0 0.0500 0.0286 0.0179 0.0119 0.0083 0.0061 0.0045 0.0035 
1 0.1000 0.0571 0.0357 0.0238 0.0167 0.0121 0.0091 0.0070 
2 0.2000 0.1143 0.0714 0.0476 0.0333 0.0242 0.0182 0.0140 
3 0.3500 0.2000 0.1250 0.0833 0.0583 0.0424 0.0318 0.0245 
4 0.5000 0.3143 0.1964 0.1310 0.0917 0.0667 0.0500 0.0385 
5 0.6500 0.4286 0.2857 0.1905 0.1333 0.0970 0.0727 0.0559 
6 0.8000 0.5714 0.3929 0.2738 0.1917 0.1394 0.1045 0.0804 
7 0.9000 0.6857 0.5000 0.3571 0.2583 0.1879 0.1409 0.1084 
8 0.9500 0.8000 0.6071 0.4524 0.3333 0.2485 0.1864 0.1434 
9 1.0000 0.8857 0.7143 0.5476 0.4167 0.3152 0.2409 0.1853 
10 0.9429 0.8036 0.6429 0.5000 0.3879 0.3000 0.2343 
11 0.9714 0.8750 0.7262 0.5833 0.4606 0.3636 0.2867 
12 1.0000 0.9286 0.8095 0.6667 0.5394 0.4318 0.3462 
13 0.9643 0.8690 0.7417 0.6121 0.5000 0.4056 
14 0.9821 0.9167 0.8083 0.6848 0.5682 0.4685 
15 1.0000 0.9524 0.8667 0.7515 0.6364 0.5315 
16 0.9762 0.9083 0.8121 0.7000 0.5944 
17 0.9881 0.9417 0.8606 0.7591 0.6538 
18 1.0000 0.9667 0.9030 0.8136 0.7133 
19 0.9833 0.9333 0.8591 0.7657 
20 0.9917 0.9576 0.8955 0.8147 
21 1.0000 0.9758 0.9273 0.8566 
22 0.9879 0.9500 0.8916 
23 0.9939 0.9682 0.9196 
24 1.0000 0.9818 0.9441 
25 0.9909 0.9615 
26 0.9955 0.9755 
27 1.0000 0.9860 
28 0.9930 
29 0.9965 


30 1,0000 
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Table AIV.7 (continued) 
P(W = a) 
Hy: Py = Py 
n, =n, nm, =m, wheren Sm 
m 
w= 2a — 5m(m + 1) 
n=4 
m 

a 4 5 6 7 8 9 10 
0 0.0143 0.0079 0.0048 0.0030 0.0020 0.0014 0.0010 

1 0.0286 0.0159 0.0095 0.0061 0.0040 0.0028 0.0020 
2 0.0571 0.0317 0.0190 0.0121 0.0081 0.0056 0.0040 

3 0.1000 0.0556 0.0333 0.0212 0.0141 0.0098 0.0070 
4 0.1714 0.0952 0.0571 0.0364 0.0242 0.0168 0.0120 
5 0.2429 0.1429 0.0857 0.0545 0.0364 0.0252 0.0180 

6 0.3429 0.2063 0.1286 0.0818 0.0545 0.0378 0.0270 

7 0.4429 0.2778 0.1762 0.1152 0.0768 0.0531 0.0380 

8 0.5571 0.3651 0.2381 0.1576 0.1071 0.0741 0.0529 

9 0.6571 0.4524 0.3048 0.2061 0.1414 0.0993 0.0709 
10 0.7571 0.5476 0.3810 0.2636 0.1838 0.1301 0.0939 
ll 0.8286 0.6349 0.4571 0.3242 0.2303 0.1650 0.1199 
12 0.9000 0.7222 0.5429 0.3939 0.2848 0.2070 0.1518 
13 0.9429 0.7937 0.6190 0.4636 0.3414 0.2517 0.1868 
14 0.9714 0.8571 0.6952 0.5364 0.4040 0.3021 0.2268 
15 0.9857 0.9048 0.7619 0.6061 0.4667 0.3552 0.2697 
16 1.0000 0.9444 0.8238 0.6758 0.5333 0.4126 0.3177 
17 0.9683 0.8714 0.7364 0.5960 0.4699 0.3666 
18 0.9841 0.9143 0.7939 0.6586 0.5301 0.4196 
19 0.9921 0.9429 0.8424 0.7152 0.5874 0.4725 
20 1.0000 0.9667 0.8848 0.7697 0.6448 0.5275 
21 0.9810 0.9182 0.8162 0.6979 0.5804 
22 0.9905 0.9455 0.8586 0.7483 0.6334 
23 0.9952 0.9636 0.8929 0.7930 0.6823 
24 1.0000 0.9788 0.9232 0.8350 0.7303 
25 0.9879 0.9455 0.8699 0.7732 
26 0.9939 0.9636 0.9007 0.8132 
27 0.9970 0.9758 0.9259 0.8482 
28 1.0000 0.9859 0.9469 0.8801 
29 0.9919 0.9622 0.9061 
30 0.9960 0.9748 0.9291 
31 0.9980 0.9832 0.9471 
32 1.0000 0.9902 0.9620 
33 0.9944 0.9730 
34 0.9972 0.9820 
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Table AIV.7 (continued) 
P(W <a) 
Ay: By = fy 
n =n, ty =m, wheren=m 


nm 
W= > Ra - (mn +1) 


i=l 


n=4; 
m 
a 4 5 6 7 8 9 10 
35 0.9986 0.9880 
36 1.0000 0.9930 
37 0.9960 
38 0.9980 
39 0.9990 
a 
n= 5; 
m 
a § 6 7 8 9 10 a 5 6 7 8 9 16 
0 | 0.0040 0.0022 0.0013 0.0008 0.0005 0.0003 26 0.9848 0.9255 0.8228 0.6968 0.5704 
1 | 0.0079 0.0043 0.0025 0.0016 0.0010 0.0007 27 0.9913 0.9470 0.8578 0.7408 0.6161 
2 | 0.0159 0.0087 0.0051 0.0031 0.0020 0.0013 28 0.9957 0.9634 0.8889 0.7812 0.6607 
3 | 0.0278 0.0152 0.0088 0.0054 0.0035 0.0023 29 0.9978 0.9760 0.9145 0.8182 0.7030 
4 | 0.0476 0.0260 0.0152 0.0093 0.0060 0.0040 30 1.0000 0.9848 0.9363 0.8511 0.7433 
5 | 0.0754 0.0411 0.0240 0.0148 0.0095 0.0063 31 0.9912 0.9534 0.8801 0.7802 
6 0.1111 0.0628 0.0366 0.0225 0.0145 0.0097 32 0.9949 0.9674 0.9051 0.8145 
7 | 0.1548 0.0887 0.0530 0.0326 0.0210 0.0140 33 0.9975 0.9775 0.9266 0.8452 
8 | 0.2103 0.1234 0.0745 0.0466 0.0300 0.0200 34 0.9987 0.9852 0.9441 0.8728 
9 | 0.2738 0.1645 0.1010 0.0637 0.0415 0.0276 35 1.0000 0.9907 0.9585 0.8968 
10 | 0.3452 0.2143 0.1338 0.0855 0.0559 0.0376 36 0.9946 0.9700 0.9177 
11 | 0.4206 0.2684 0.1717 0.1111 0.0734 0.0496 37 0.9969 0.9790 0.9354 
12 | 0.5000 0.3312 0.2159 0.1422 0.0949 0.0646 38 0.9984 0.9855 0.9504 
13 | 0.5794 0.3961 0.2652 0.1772 0.1199 0.0823 39 0.9992 0.9905 0.9624 
14 | 0.6548 0.4654 0.3194 0.2176 0.1489 0.1032 40 1.0000 0.9940 0.9724 
15 | 0.7262 0.5346 0.3775 0.2618 0.1818 0.1272 41 0.9965 0.9800 
16 | 0.7897 0.6039 0.4381 0.3108 0.2188 0.1548 42 0.9980 0.9860 
17 | 0.8452 0.6688 0.5000 0.362! 0.2592 0.1855 43 0.9990 0.9903 
18 | 0.8889 0.7316 0.5619 0.4165 0.3032 0.2198 44 0.9995 0.9937 
19 | 0.9246 0.7857 0.6225 0.4716 0.3497 0.2567 45 1.0000 0.9960 
20 | 0.9524 0.8355 0.6806 0.5284 0.3986 0.2970 46 0.9977 
21 | 0.9722 0.8766 0.7348 0.5835 0.4491 0.3393 47 0.9987 
22 | 0.9841 0.9113 0.7841 0.6379 0.5000 0.3839 48 0.9993 
23 | 0.9921 0.9372 0.8283 0.6892 0.5509 0.4296 49 0.9997 
24 | 0.9960 0.9589 0.8662 0.7382 0.6014 0.4765 50 1.0000 
25 | 1.0000 0.9740 0.8990 0.7824 0.6503 0.5235 
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Table AIV.7 (continued) 
P(W <a) 
Ay: Py = By 
ny =n, n,=m, wheren Sm 


m 
1 
W= > Rp - 5mm + 1) 


i=1 


n= 6: n=6 
m m 
a 6 7 8 9 10 a 6 7 8 9 10 
0 0.0011 06,0006 0.0003 0.0002 0.0001 31 0.9870 0.9312 0.8275 0.6965 0.5626 
1 0.0022 0.0012 0.0007 0.0004 0.0002 32 0.9924 0.9493 0.8588 0.7357 0.6038 
2 0.0043 0.0023 0.0013 0.0008 0.0005 33 0.9957 0.9633 0.8858 0.7720 0.6436 
3 0.0076 0.0041 0.0023 0.0014 0.0009 34 0.9978 0.9744 0.9094 0.8058 0.6823 
4 0.0130 8.0070 0.0040 0.0024 0.0015 35 0.9989 0.9825 0.9291 0.8362 0.7189 
5 0.0206 0.0111 0.0063 0.0038 0.0024 36 1.0000 0.9889 0.9461 0.8639 0.7539 
6 0.0325 0.0175 0.0100 0.0060 0.0037 37 0.9930 0.9594 0.8881 0.7861 
7 0.0465 0.0256 0.0147 0.0088 0.0055 38 0.9959 0.9704 0.9095 0.8162 
8 0.0660 0.0367 0.0213 0.0128 0.0080 39 0.9977 0.9787 0.9277 0.8434 
9 0.0898 0.0507 0.0296 0.0180 0.0112 40 0.9988 0.9853 0.9433 0.8683 
10 0.1201 0.0688 0.0406 0.0248 0.0156 4] 0.9994 0.9900 0.9560 0.8901 
1 0.1548 0.0903 0.0539 0.0332 0.0210 42 1.0000 0.9937 0.9668 0.9097 
12 0.1970 0.1171 0.0709 0.0440 0.0280 43 0.9960 0.9752 0.9264 
13 0.2424 0.1474 0.0906 0.0567 0.0363 44 0.9977. 0.9820 0.9411 
14 0.2944 0.1830 0.1142 0.0723 0.0467 45 0.9987 0.9872 0.9533 
15 0.3496 0.2226 0.1412 0.0905 0.0589 46 0.9993 0.9912 0.9637 
16 0.4091 0.2669 0.1725 0.1119 0.0736 47 0.9997 0.9940 0.9720 
17 0.4686 0.3141 0.2068 0.1361 0.0903 48 1.0000 0.9962 0.9790 
18 0.5314 0.3654 0.2454 0.1638 0.1099 49 0.9976 0.9844 
19 0.5909 0.4178 0.2864 0.1942 0.1317 50 0.9986 0.9888 
20 0.6504 0.4726 0.3310 0.2280 0.1566 51 0.9992 0.9920 
21 0.7056 0.5274 0.3773 0.2643 0.1838 52 0.9996 0.9945 
22 0.7576 0.5822 0.4259 0.3035 = 0.2139 53 0.9998 0.9963 
23 0.8030 0.6346 0.4749 0.3445 0.2461 54 1.0000 0.9976 
24 0.8452 0.6859 0.5251 0.3878 0.2811 55 0.9985 
25 0.8799 0.7331 0.5741 0.4320 0.3177 56 0.9991 
26 0.9102 0.7774 0.6227 0.4773 0.3564 57 0.9995 
27 0.9340 0.8170 0.6690 0.5227 0.3962 58 0.9998 
28 0.9535 0.8526 0.7136 0.5680 0.4374 59 0.9999 
29 0.9675 0.8829 0.7546 0.6122 0.4789 60 1.0000 
30 0.9794 0.9097 0.7932 0.6555 0.5211 
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Table AIV.7 (continued) 
P(W <a) 
Ay: hy = py 
my =n, nt2=m, wheren=m 
m 
1 
Ww 2h xm(m + 1) 
n=7; n=7 
m m 

a 7 8 9 10 a 7 8 9 10 

0 0.0003 0.0002 0.0001 0.0001 36 0.9359 0.8322 0.6968 0.5566 

1 0.0006 0.0003 0.0002 0.0001 37 0.9513 0.8595 0.7320 0.5937 

2 0.0012 0.0006 0.0003 0.0002 38 0.9636 0.8841 0.7651 0.6302 

3 0.0020 0.0011 0.0006 0.0004 39 0.9735 0.9054 0.7961 0.6655 

4 0.0035 0.0019 0.0010 0.0006 40 0.9811 0.9240 0.8245 0.6996 

5 0.0055 0.0030 0.0017 0.0010 41 0.9869 0.9397 0.8504 0.7319 

6 0.0087 0.0047 0.0026 0.0015 42 0.9913 0.9531 0.8739 0.7626 

7 0.0131 0.0070 0.0039 0.0023 43 0.9945 0.9639 0.8948 0.7913 

8 0.0189 0.0103 0.0058 0.0034 44 0.9965 0.9730 0.9131 0.8181 

9 0.0265 0.0145 0.0082 0.0048 45 0.9980 0.9800 0.9292 0.8426 
10 0.0364 0.0200 0.0115 0.0068 46 0.9988 0.9855 0.9429 0.8651 
1] 0.0487 0.0270 0.0156 0.0093 47 0.9994 0.9897 0.9546 0.8852 
12 0.0641 0.0361 0.0209 0.0125 48 0.9997 0.9930 0.9644 0.9034 
13 0.0825 0.0469 0.0274 0.0165 49 1.0000 0.9953 0.9726 0.9194 
14 0.1043 0.0603 0.0356 0.0215 50 0.9970 0.9791 0.9335 
15 0.1297 0.0760 0.0454 0.0277 51 0.9981 0.9844 0.9456 
16 0.1588 0.0946 0.0571 0.0351 52 0.9989 0.9885 0.9561 
17 0.1914 0.1159 0.0708 0.0439 53 0.9994 0.9918 0.9649 
18 0.2279 0.1405 0.0869 0.0544 54 0.9997 0.9942 0.9723 
19 0.2675 0.1678 0.1052 0.0665 55 0.9998 0.9961 0.9785 
20 0.3100 0.1984 0.1261 0.0806 56 1.0000 0.9974 0.9835 
21 0.3552 0.2317 0.1496 0.0966 57 0.9983 0.9875 
22 0.4024 0.2679 0.1755 0.1148 58 0.9990 0.9907 
23 0.4508 0.3063 0.2039 0.1349 59 0.9994 0.9932 
24 0.5000 0.3472 0.2349 0.1574 60 0.9997 0.9952 
25 0.5492 0.3894 0.2680 0.1819 61 0.9998 0.9966 
26 0.5976 0.4333 0.3032 0.2087 62 0.9999 0.9977 
27 0.6448 0.4775 0.3403 0.2374 S43 1.0000 0.9985 
28 0.6900 0.5225 0.3788 0.2681 64 0.9990 
29 0.7325 0.5667 0.4185 0.3004 65 0.9994 
30 0.7721 0.6106 0.4591 0.3345 66 0.9996 
31 0.8086 0.6528 0.5000 0.3698 67 0.9998 
32 0.8412 0.6937 0.5409 0.4063 68 0.9999 
33 0.8703 0.7321 0.5815 0.4434 69 0.9999 
34 0.8957 0.7683 0.6212 0.4811 70 1.0000 
35 0.9175 0.8016 0.6597 0.5189 
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Table AIV.7 (continued) 
P(W = a) 
Hy: By = Be 
ny=n, ny=m, wheren =m 
m 
1 
We >Re — 5m(m + 1) 
i=1 
n=8 n=8: n=8 
m m m 
a 8 9 10 a 8 9 10 a 8 9 10 
0 0.0001 0.0000 0.0000 28 0.3605 0.2404 0.1577 56 0.9965 0.9768 0.9271 
| 0.0002 0.0001 0.0000 29 0.3992 0.2707 0.1800 57 0.9977 0.9820 0.9390 
2 0.0003 0.0002 0.0001 30 0.4392 0.3029 0.2041 58 0.9985 0.9863 0.9494 
3 0.0005 0.0003 0.0002 31 0.4796 0.3365 0.2299 59 0.9991 0.9897 0.9584 
4 0.0009 0.0005 0.0003 32 0.5204 0.3715 0.2574 60 0.9995 0.9924 0.9662 
5 0.0015 0.0008 0.0004 33 0.5608 0.4074 0.2863 61 0.9997 0.9944 0.9727 
6 0.0023 0.0012 0.0007 34 0.6008 0.4442 0.3167 62 0.9998 0.9961 0.9783 
7 0.0035 0.0019 0.0010 35 0.6395 0.4813 0.3482 63 0.9999 0.9972 0.9829 
8 0.0052 0.0028 0.0015 36 0.6773 0.5187 0.3809 64 1.0000 0.9981 0.9867 
9 0.0074 0.0039 0.0022 37 0.7131 0.5558 0.4143 65 0.9988 0.9897 
10 0.0103 0.0056 0.0031 38 0.7473 0.5926 0.4484 66 0.9992 0.9922 
11 0.0141 0.0076 0.0043 39 0.7791 0.6285 0.4827 67 0.9995 0.9942 
12 0.0190 0.0103 0.0058 40 0.8089 0.6635 0.5173 68 0.9997 0.9957 
13 0.0249 0.0137 0.0078 4 0.8359 0.6971 0.5516 69 0.9998 0.9969 
14 0.0325 0.0180 0.0103 42 0.8607 0.7293 0.5857 70 0.9999 0.9978 
15 0.0415 0.0232 0.0133 43 0.8828 0.7596 0.6191 71 1.0000 0.9985 
16 0.0524 0.0296 0.0171 44 0.9026 0.7883 0.6518 72 1.0000 0.9990 
17 0.0652 0.0372 0.0217 45 0.9197 0.8148 0.6833 73 0.9993 
18 0.0803 0.0464 0.0273 46 0.9348 0.8394 0.7137 74 0.9996 
19 0.0974 0.0570 0.0338 47 0.9476 0.8617 0.7426 75 0.9997 
20 0.1172 0.0694 0.0416 48 0.9585 0.8821 0.7701 76 0.9998 
21 0.1393 0.0836 0.0506 49 0.9675 0.9002 0.7959 77 0.9999 
22 0.1641 0.0998 0.0610 50 0.9751 0.9164 0.8200 78 1.0000 
23 0.1911 0.1179 0.0729 51 0.9810 0.9306 0.8423 719 1.0000 
24 0.2209 0.1383 0.0864 52 0.9859 0.9430 0.8629 80 1.0000 
25 0.2527 0.1606 0.1015 53 0.9897 0.9536 0.8815 
26 0.2869 0.1852 0.1185 54 0.9926 0.9628 0.8985 
27 0.3227) 0.2117 0.1371 55 0.9948 0.9704 0.9136 
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Table AIV.7 (continued) 
P(W = a) 
Hy: hy = Be 
my =n, Ay =m, wheren=m 
m 
1 
We > Ra 5mm + 1) 
ii 
n=9 n=9: n=9 
m m m 
a 9 10 a 9 10 a 9 10 
0 0.0000 0.0000 31 0.2181 0.1388 62 0.9748 0.9218 
1 0.0000 0.0000 32 0.2447 0.1577 63 0.9800 0.9333 
2 0.0001 0.0000 33 0.2729 0.1781 64 0.9843 0.9436 
3 0.0001 0.0001 34 0.3024 0.2001 65 0.9878 0.9526 
4 0.0002 0.0001 35 0.3332 0.2235 66 0.9906 0.9606 
5 0.0004 0.0002 36 0.3652 0.2483 67 0.9929 0.9674 
6 0.0006 0.0003 37 0.3981 0.2745 68 0.9947 0.9733 
7 0.0009 0.0005 38 0.4317 0.3019 69 0.9961 0.9783 
8 0.0014 0.0007 39 0.4657 0.3304 70 0.9972 0.9825 
9 0.0020 0.0011 40 0.5000 0.3598 71 0.9980 0.9860 
10 0.0028 0.0015 41 0.5343 0.3901 72 0.9986 0.9890 
ll 0.0039 0.0021 42 0.5683 0.4211 73 0.9991 0.9914 
12 0.0053 0.0028 43 0.6019 0.4524 74 0.9994 0.9934 
13 0.0071 0.0038 44 0.6348 0.4841 75 0.9996 0.9949 
14 0.0094 0.0051 45 0.6668 0.5159 76 0.9998 0.9962 
15 0.0122 0.0066 46 0.6976 0.5476 77 0.9999 0.9972 
16 0.0157 0.0086 47 0.7271 0.5789 78 0.9999 0.9979 
17 0.0200 0.0110 48 0.7553 0.6099 79 1.0000 0.9985 
18 0.0252 0.0140 49 0.7819 0.6402 80 1.0000 0.9989 
19 0.0313 0.0175 50 0.8067 0.6696 81 1.0000 0.9993 
20 0.0385 0.0217 51 0.8299 0.6981 82 0.9995 
21 0.0470 0.0267 52 0.8513 0.7255 83 0.9997 
22 0.0567 0.0326 53 0.8710 0.7517 84 0.9998 
23 0.0680 0.0394 54 0.8888 0.7765 85 0.9999 
24 0.0807 0.0474 55 0.9049 0.7999 86 0.9999 
25 0.0951 0.0564 56 0.9193 0.8219 87 1.0000 
26 0.1112 0.0667 57 0.9320 0.8423 88 1.0000 
27 0.1290 0.0782 58 0.9433 0.8612 89 1.0000 
28 0.1487 0.0912 59 0.9530 0.8786 90 1.0000 
29 0.1701 0.1055 60 0.9615 0.8945 
30 0.1933 0.1214 61 0.9687 0.9088 
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Table AIV.7 (continued) 
P(W = a) 
Hy: Py = By 
ny =n, ny =m, wheren =m 
m 
W = 5’ Ri,— 5mm + 1) 
i=1 
n= 10 n= 10: n= 10 
m m m 
a 10 a 10 a 10 
0 0.0000 35 0.1399 70 0.9385 
1 0.0000 36 0.1575 71 0.9474 
2 0.0000 37 0.1763 72 0.9554 
3 0.0000 38 0.1965 73 0.9624 
4 0.0001 39 0.2179 74 0.9685 
5 0.0001 40 0.2406 15 0.9738 
6 0.0002 41 0.2644 76 0.9784 
7 0.0002 42 0.2894 77 0.9823 
8 0.0004 43 0.3153 78 0.9856 
9 0.0005 44 0.3421 719 0.9884 
10 0.0008 45 0.3697 80 0.9907 
1] 0.0010 46 0.3980 81 0.9927 
12 0.0014 47 0.4267 82 0.9943 
13 0.0019 48 0.4559 83 0.9955 
14 0.0026 49 0.4853 84 0.9966 
15 0.0034 50 0.5147 85 0.9974 
16 0.0045 51 0.5441 86 0.9981 
17 0.0057 352 0.5733 87 0.9986 
18 0.0073 53 0.6020 88 0.9990 
19 0.0093 54 0.6303 89 0.9992 
20 0.0116 55 0.6579 90 0.9995 
21 0.0144 56 0.6847 91 0.9996 
22 0.0177 57 0.7106 92 0.9998 
23 0.0216 58 0.7356 93 0.9998 
24 0.0262 59 0.7594 94 0.9999 
25 0.0315 60 0.7821 95 0.9999 
26 0.0376 61 0.8035 96 1.0000 
27 0.0446 62 0.8237 97 1.0000 
28 0.0526 63 0.8425 98 1.0000 
29 0.0615 64 0.8601 99 1.0000 
30 0.0716 65 0.8763 100 1,0000 
31 0.0827 66 0.8912 
32 0.0952 67 0.9048 
33 0.1088 68 0.9173 
34 0.1237 69 0.9284 
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Table AIV.8 Friedman Test 
Hy: By = By = = Dy 
P = P(SSs) 
12km < 1 2 
Ket DZ a, e+) 
Note: k = number of treatment levels, m = number of blocks. 

s P Ss P Ss P Ss P Ss P 
k=3,m= 2.800 0.818 3.429 0.808 13.000 1.000 k=3,m=10 
3.600 (0.876 3.714 0.888 14.250 1,000 
0.000 0.167 4.800 0,907 4.571 0.915 16.000 1.000 0.000 0.026 
1.000 0.500 5.200 0.961 5.429 0,949 0.200 0.170 
3.000 0.833 6.400 0.976 6.000 —-0.973 0.600 0.290 
4.000 1.000 7.600 0.992 7.143 0.979 : P 0.800 0.399 

8.400 0.999 7.714 0.984 b=3m=9 1.400 0.564 
F Pp 10.000 1.000 8.000 —-0.992 1.800 0.632 
8.857 0.996 0.000 0.029 2.400 0.684 
k=3,m= 10.286 0.997 0.222 0.186 2.600 0.778 
s a 10.571 0.999 0.667 0.315 3.200 0.813 
0.000 0.056 p= 6 pS 11.143 1.000 0.889 0.431 3.800 0.865 
2.000 0.639 0.000 0.044 14.000 1.000 2.000 0.672 5.000 0.922 
2.667 0.806 0.333 0.260 2.667 0.722 5.400 0.934 
4.667 0.972 1.000 0.430 2.889 0.813 5.600 0.954 
6.000 1.000 1.333 0.570 s P 3.556 0,846 6.200 0.970 
2.333 0.748 ea 4.222 (0.893 7.200, 0.974 
a P 3.000 0.816 4.667 0.931 7.400 0.982 
4.000 0.858 0.000 0.033 5.556 0.943 7.800 0.988 
k=3,m= 4,333 0.928 0.250 0.206 6.000 0.952 8.600 0.993 
5.333 0.948 0.750 0.346 6.222 0.969 9.600 0.994 
0.000 0.069 6.333. 0.971 1.000 0.469 6.889 0,981 9.800 0.997 
0.500 0.347 7,000 0.988 1.750 0.645 8.000 0.984 10.400 0.998 
1.500 0.569 8.333 0,992 2.250 0.715 8.222 0.990 11.400 0.999 
2.000 0.727 9.000 0.994 3,000 0.764 8.667 0.994 12.200 0.999 
3.500 0.875 9.333 0.998 3.250 0.851 9.556 (0.996 12.600 0.999 
4.500 0.931 49,333 1.000 4.000 0.880 10.667 0.997 12.800 1.000 
6.000 0.958 12.000 1.000 4.750 0.921 10.889 0.999 13.400 1.000 
6.500 0.995 5.250 0.953 11.556 0.999 14.600 1.000 
8.000 1,000 6.250 0.962 12.667 ‘1.000 15.000 1.000 
s P 6.750 0,970 13.556 ‘1.000 15.200 1.000 
: P k=3,m=7 7.000 0.982 14.000 1.000 15.800 1.000 
7.750 0.990 14.222 1.000 16.200 1.000 
k=3,m=5 0.000 0.036 9.000 0.992 14.889 1.000 16.800 1.000 
an aads 0.286 0.232 9.250 0.995 16.222 1.000 18.200 1.000 
. 0.857 0.380 9.750 0.998 18.000 1.000 20.000 1.000 
0.400 0.309 1.443 0.514 10.750 (0.999 
ee ee 2.000 0.695 12.000 0.999 
0.633957 0.763 12.250 1.000 
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Table AIV.8 (continued) 
Ay: By = B= = 
P = P(SS s) 
12km 1 : 
"EktD a R. atk + ») 
Note: k = number of treatment levels, 7: = number of blocks. 
s P s P s P Ss P s P 
k=3,m=11 20.182 1.000 15.167 1.000 8.000 0.988 k=3, m= 10 
ea 22.000 1,000 15.500 1.000 8.769 0.991 eee! 
0.000 0.024 16.167 1.000 9.385 0.993 0.571 0.306 
0.182 0.156 a 16.667 1.000 9.692 0.995 1.000 0.449 
0.545 0.268 s P 17.167 1.000 9.846 0.996 1.286 0.511 
0.727 0.371 P= 5. = 2 18.000 1.000 10.308 0.997 1.714 0.562 
1.273 0.530 fs ea 18.167 1.000 11.231 0.998 1.857 0.656 
1.636 0.597 0.000 0.022 18.500 ‘1.000 11.538 0,998 2.286 0.695 
2.182 0,649 0.167 0,144 18.667 1.000 11.692 0.999 2.714 0.758 
2.364 0.744 0.500 0.249 19.500 1.000 12.154 0,999 3.000 0.812 
2.909 0.781 0.667 0.346 20.167 ‘1,000 12.462 0.999 3.571 0.833 
3.455 0.837 1.167 0.500 20.667 —-:1.000 12.923 0.999 3.857 0.850 
3.818 0.884 1.500 0.566 22.167 1.000 14.000 1.000 4,000 (0.884 
4.545 0.900 2.000 0.617 24,000 1.000 14.308 1.000 4.429 0.910 
4.909 0.913 2.167 0.713 _—,. 14.923 1.000 5.143 0.920 
5.091 0.938 2.667 0.751 pe 15.385 1.000 5.286 0.937 
5.636 0.957 3.167 0.809 s P 15.846 1.000 5.571 0.952 
6.545 0.962 3.500 0.859 £=3. meh 16.615 1.000 6.143 0.963 
6.727 0.973 4.167 0.877 se 16.769 1.000 6.857 0.967 
7.091 0.981 4.500 0.892 0.000 0.020 17.077 1.000 7.000 0.978 
7.818 0.987 4.667 0.920 0.154 0.134 17.231 1.000 7.429 0.983 
8.727 0,989 5.167 0,942 0.462 0.233 18.000 1.000 8.143 0.987 
8.909 0.994 6.000 0.949 0.615 0.325 18.615 1.000 8.714 0.990 
9.455 0.996 6.167 0.962 1.077 0.473 19.077 1.000 9.000 0.992 
10.364 0.997 6.500 0.973 1.385 0.537 19.538 1.000 9.143 0.993 
11.091 0.998 7.167 0.980 1.846 0.588 19.846 1.000 9.571 0.995 
11.455 0.999 8.000 0.983 2.000 0.684 20.462 1.000 10.429 0.996 
11.636 0.999 8.167 0.989 2.462 0.722 21.385 1.000 10.714 0.997 
12.182 0.999 8.667 0.993 2.923 0.783 92.154 1.000 10.857 0.998 
13.273 1.000 9.500 0.995 3.231 0.835 22.615 1.000 11.286 0.998 
13.636 1.000 = 10.167 (0.996 3.846 0.855 24.154 1,000 11.571 0.998 
13.818 1.000 10.500 0.997 4,154 0.871 26.000 1.000 12.000 0.999 
14.364 1.000 10.667 0.998 4,308 0.902 .. 13.000 0.999 
14.727 1.000 11.167 0.998 4.769 0.927 13.286 1,000 
15.273 1.000 12.167 0,999 5.538 0.935 s P 13.857 1.000 
16.545 1.000 12.500 —-0.999 5.692 0.950 fed wets 14.286 1.000 
16.909 1.000 12.667 0.999 6.000 0.963 eee 14.714 1.000 
17.636 1.000 13.167 1.000 6.615 0.972 0.000 0.019 15.429 1.000 
18.182 1.000 13.500 1.000 7.385 0.975 0.143 0.126 15.571 1.000 
18.727 1.000 14.000 1.000 7.538 0.984 0.429 0.219 15.857 1.000 
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Table AIV.8 (continued) 
Hy: Wy = Bb, =: = By 
P = P(SS s) 


p4 
eal G = stk 5 ) 


MRA D & 
Note: k = number of treatment levels, 7: = number of blocks. 


5 P s P s P Ss P Ss P 
16.000 1.000 4.933 0.923 22.533 1.000 4.200 0.793 k=3,m=10 
16.714 1.000 5.200 0.940 22.800 1.000 5,000 0.825 —__——_— 
17.286 1.000 5.733 0.953 22.933 1.000 5.400 0.852 8.100 0.981 
17.714 1.000 6.400 0.958 23.333 1.000 5,800 0.925 8.400 0.986 
18.143 1.000 6.533 0.970 24.133 1.000 6.600 0.946 8.700 0.988 
18.429 1.000 6.933 0.977 24.400 1.000 7.000 0.967 9.300 0.993 
19.000 1.000 7.600 0,982 25.200 1.000 7.400 0.983 9.600 0.994 
19.857 1.000 8.133 0.986 26.133 1.000 8.200 0.998 9.900 0.997 
20.571 1.000 8.400 0.989 26.533 1.000 9.000 1.000 10.200 0.998 
21.000 1.000 8.533 0.990 28.133 1.000 10.800 0.999 
21.143 1.000 8.933 0.993 30.000 1.000 —— 11.100 1.000 
21.571 1.000 9.733 0,994. s P 12.000 1.000 
22.286 1.000 10.000 0.995 a =4m< 
22.429 1.000 10.133 0.996 s P a ee: : a 
23.286 1.000 10.533 0.997 k=4,m=2 0.000 0.008 
24.143 1.000 10.800 0.997 Se 0.300 0.072 k=4,m=5 


24.571 1.000 11.200 —-0,998 0.000 0.042 0.600 0.100 

26.143 1.000 12.133 0.999 0.600 0.167 0.900 0.200 0.120 0,025 

28.000 1.000 12.400 0.999 1.200 0.208 1.200 0.246 0.360 0.056 
12.933 0,999 1.800 0.375 1.500 0.323 0.600 0.143 
13.333 0.999 2.400 0.458 1.800 0.351 1.080 0.229 


Ss Ps 43733 1.000 (3.000 siS42—sia2100s—«i76—Ss«d'D(.291 
k=3,m=15 14,400 1.000 3.600 0.625 2.400 0.492 1.560 0.348 
14.533 1.000 4.200 0.792 2.700 0.568 2.040 0.439 


0.000 0.018 14.800 1.000 4.800 0.833 3.000 0.611 2.280 0.479 
0.133 0.118 14.933 1.000 5.400 0.958 3.300 0.645 2.520 0.555 
0.400 0.206 15.600 1.000 6.000 1.000 3.600 0.676 3.000 0.592 


0.533 0.289 16.133 1.000 3.900 0.758 3.240 0.628 
0.933 0.427 16.533 1.000 ——___— 4.500 0.800 3.480 0.702 
1.200 0.487 ~—-16.933 1.000 J F 4.800 0810 3960 0.740 
1.600 0.537 17.200 ~—1.000 k=4,m=3 5.100 0.842 4.200 0.774 
1.733 0.631 17.733 ‘1.000 —— 5.400 0.859 4.440 0.790 


2.133 0.670 18.533 1.000 0.200 0.042 5.700 0.895 4.920 0.838 
2.533 0.733 19.200 1.000 0.600 0.090 6.000 0.906 5.160 0.849 
2.800 0.789 19.600 1.000 1.000 0.273 6.300 0.923 5.400 0.877 
3.333 0.811 19.733 1.000 1.800 0.392 6.600 0.932 5.880 0.893 
3.600 0.830 20.133 1.000 2.200 0.476 6.900 0.946 6.120 0.907 
3.733 0.865 20.800 1.000 2.600 0.554 7.200 0.948 6.360 0.925 
4,133 0.894 20.933 1.000 3.400 0.658 7.500 0.964 6.840 0.933 
4.800 0.904 21.733 1.000 3.800 0.700 7.800 0.967 7.080 0.945 
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Table AIV.8 (continued) 
Hy: By = Py = -- > = Py 
P = P(SSs) 
W2km < 1 ; 
eT esr Ra IR, atk + 0) 
Note: k = number of treatment levels, m = number of blocks. 

s P s P s P Ss P s P 
7.320 0.956 2.400 0.488 10.800 0.994 1.629 0.348 10.543 0.992 
7.800 0.966 2.600 0.569 11.000 0.996 1.800 0.410 10.714 0.993 
8.040 0.969 3.000 0.614 11.400 0.997 2.143 0.443 11.057 0.995 
8.280 0.977 3.200 0.625 11.600 0.997 2.314 0.476 11.229 0.996 
8.760 0.980 3.400 0.662 11.800 0.998 2.486 0.544 11.400 0.996 
9.000 0.983 3.600 0.683 12.000 0.998 2.829 0.582 11.743 0.997 
9.240 0.988 3.800 0.730 12.200 0.999 3.000 0.618 11.914 0.997 
9.720 0.991 4.000 0.744 12.600 0.999 3.171 0.634 12.086 0.998 
9.960 0.993 4,200 0.770 12.800 0.999 3.514 0.690 12.429 0.998 

10.200 0.995 4.400 0.782 13.000 0.999 3.686 0.703 12.600 0.998 
10.680 0.997 4.600 0.803 13.200 0.999 3.857 0.738 12.771 0.999 
10.920 0.998 4.800 0.806 13.400 1.000 4.200 0.761 13.114 0.999 
11.160 0.998 5.000 0.837 13.600 1,000 4.371 0.780 13.286 0.999 
11.640 0.998 5.200 0.845 13.800 1.000 4.543 0.805 13.457 0.999 
11.880 0.999 5.400 0.873 14.000 1.000 4.886 0.820 13.800 0.999 
12.120 0.999 5.600 0.886 14.400 1.000 §.057 0.839 13.971 0.999 
12.600 1.000 5.800 0.892 14.600 1.000 §.229 0.857 14.143 1,000 
12.840 1.000 6.200 0.911 14.800 1.000 5.571 0.878 14,486 1.000 
13.080 1.000 6.400 0.912 15.000 1,000 §.743 0.882 14.657 1,000 
13.560 1.000 6.600 0.927 15.200 1.000 5.914 0.900 14.829 1.000 
14.040 1,000 6.800 0.934 15.400 1,000 6.257 0.907 15.171 1.000 
15.000 1.000 7.000 0.940 15.800 1.000 6.429 0.915 15.343 1,000 
7.200 0.944 16.000 1.000 6.600 0.927 15.514 1,000 

7.400 0.957 16.200 1.000 6.943 0.937 15.857 1.000 

: P 7.600 0.959 16.400 1.000 7.14 0.944 16.029 1.000 
k=4,m=6 7.800 0.963 17.000 1,000 7.286 0.948 16.200 1.000 
— 8.000 0.965 18.000 1.000 7.629 0.959 16.543 1.000 
0.000 0.004 8.200 0.968 7.800 0.962 16.714 1.000 
0.200 0.043 8.400 0.971 SUE ae 7.971 0.965 16.886 1.000 
0.400 0.060 8.600 0.977 2 P 8.314 0.967 17.229 1.000 
0.600 0.126 8.800 0.978 k=4,m=7 8.486 0.970 17.400 1.000 
0.800 0.156 9.000 0.983 8.657 0.977 17.571 1.000 
1.000 0.211 9.400 0.986 0.086 0.016 9.000 0.980 17.914 1,000 
1.200 0.228 9.600 0.987 0.257 0.037 9.171 0.983 18.257 1.000 
1.400 0.321 9.800 0.990 0.429 0.094 9.343 0.985 18.771 1.000 
1.600 0.332 10.000 0.990 0.771 0.155 9.686 0.987 18.943 1.000 
1,800 0.391 10.200 0.991 0.943 0.200 9.857 0.988 19.286 1.000 
2.000 0.426 10.400 0.993 1.114 0.243 10.029 0.990 19.971 1.000 
2.200 0.459 10.600 0.994 1.457 0.315 10.371 0.991 21.000 1.000 
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Table AIV.8 (continued) 


Hy : Py = ps ee — By 
P = P(SS s) 
k 2 
12km 1 
ee = ote 
CES TPS [R. stk »| 


Note: k = number of treatment levels, m = number of blocks. 


Ss P s P s P Ss P Ss P 
k=4,m=8 5.850 0.890 12.300 0.997 18.600 1.000 5.200 0.742 
ofa = ainiat 6.000 0.894 12.450 0.998 18.750 1.000 5.600 0.775 


0.000 0.002 6.150 0.900 12.600 0.998 19.050 1.000 6.000 0.825 
0.150 0.029 6.300 0.906 12.750 0.998 19.200 1.000 6.400 0.883 
0.300 0.041 6.450 0.919 12.900 0.998 19.350 1.000 6.800 0.933 
0.450 0.088 6.600 ~=—-:0,921 13.050 0.998 19.500 1.000 7.200 0.958 
0.600 0.110 6.750 0.932 13.200 0.999 19.650 1.000 7.600 0.992 
0.750 0.151 7.050 0.940 13.350 0.999 19.800 1.000 8.000 1.000 
0.900 0.163 7.200 0.942 13.500 0.999 19.950 1.000 

1.050 0.235 7.350 0.949 13.650 0.999 20.250 1.000 


1.200 0.243 = 7.500 0.951 13.800 0.999 20.400 _—1.000 § 
1.350 0.290 7.650 0.954 13.950 0.999 20.550 1.000 k=5,m=3 
1.500 0.319 7.800 0.958 14.250 0.999 20.700 —-1.000 — 
1.650 0.346 7.950 0.962 14.400 0.999 20.850 1.000 0.000 0.000 
1.800 0.371 8.100 0.963 14.550 0.999 21.150 1.000 0.267 0.012 


1.950 0.442 8.250 0.969 14.700 ‘1.000 21.600 1.000 0.533 0.028 
2.250 0.483 8.550 0,972 14.850 1.000 21.750 1.000 0.800 0.059 
2.400 0.493 8.700 0.975 15.000 1.000 21.900 ‘1.000 1.067 0.086 
2.550 0.529 8.850 0.977 15.150 1.000 22.200 1.000 1.333 0.155 
2.700 0.550 9.000 0.978 15.300 1.000 22.950 1.000 1.600 0.169 
2.850 0.596 9.150 0,981 15.450 1.000 24.000 1.000 1.867 0.232 


3.000 0.611 9.450 0.984 15.600 1.000 2.133 0.280 
3.150 0.638 9.600 0.985 15.750 1.000 a 2.400 0.318 
3.300 = 0.65049. 750 0.986 ~—S'15.900_—‘1.000 : i 2667 0.351 
3.450 0.674 9.900 0.986 16.050 1.000 k=5,.m=2 2.933 0.405 
3.600 0.677 10.050 0.989 16.200 1,000 —___—_ 3.200 0.441 


3.750 0.713 10.200 0.989 16.350 1.000 0.000 0.008 3.467 0.507 
3.900 0.722 40.350 0.991 16.650 —‘1.000 0.400 0.042 3138 0.525 
4.050 0.758 = 10.500 (0.991 16.800 1.000 0.800 0.067 4.000 0.568 
4.200 0.774 10.650 ~—-0.992 16.950 1,000 1.200 0.117 4.267 0.594 
4,350 0.781 10.800 0.992 17.100 1.000 1.600 0.175 4.533 0.653 
4.650 0.807 10.950 (0.994 17.250 ‘1.000 2.000 0.225 4.800 0.674 
4.800 0.809 11.100 0,994 17.400 1.000 2.400 0.258 5.067 0.709 
4.950 0.832 11.250 0.995 17.550 1.000 2.800 0.342 5.333 0.747 
5.100 0.842 11.400 0.995 17.700 ‘1.000 3.200 0.392 5.600 0.764 
5.250 0.852 11.550 0,996 17.850 1.000 3.600 0.475 5.867 0.787 
5.400 0.859 11.850 0.996 18.150 1.000 4.000 0.525 6.133 0.828 
5.550 0.879 12.000 ~—(0.996 18.300 1.000 4.400 0.608 6.400 0.837 
5.700 0.883 12.150 0,997 18.450 1.000 4.800 0.658 6.667 0.873 
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Table AIV.8 (continued) 


Ay 7 py = Py =.= By 
P = P(SS s) 
km < 1 2 
Smeal i(k + 
eed La R,. 3k »] 


Note: k = number of treatment levels, 7 = number of blocks. 


6.933 0.883 3.200 0.448 11.000 0.992 
7.200 0.904 3.400 0.500 11.200 0.993 
7.467 0.920 3.600 0.521 11.400 0.994 
7.733 0.937 3.800 0.558 11.600 0.995 
8.000 0.944 4,000 0.587 11.800 0.996 
8.267 0.955 4.200 0.605 12.000 0.996 
8.533 0.962 4.400 0.630 12.200 0.997 
8.800 0.972 4.600 0.671 12.400 0.998 
9.067 0.974 4.800 0.683 12.600 0.998 
9.333 0.983 5,000 0.714 12.800 0.999 
9.600 0.985 5.200 0.725 13.000 0.999 
9.867 0.992 5,400 0.751 13.200 0.999 

10.133 0.995 5.600 0.773 13.400 0.999 

10.400 0.996 5.800 0.795 13.600 1.000 

10.667 0.997 6.000 0.803 13.800 1.000 

10.933 0.999 6.200 0.822 14.000 1,000 

11.467 1.000 6.400 0.839 14.200 1.000 

12.000 1.000 6.600 0.857 14.400 1.000 

6.800 0.864 14.600 1.000 

7.000 0.879 14.800 1.000 

7.200 0.887 15.200 1.000 
k=5,m=4 7.400 0.905 15.400 1.000 

—____ 7.600 0.914 16.000 1.000 

0.000 0.001 7.800 0.920 OO 

0.200 0.009 8.000 0.928 

0.400 0.020 8.200 0.937 

0.600 0.041 8.400 0.940 

0.800 0.060 8.600 0.951 

1.000 0.094 8.800 0.957 

1.200 0.105 9.000 0.962 

1.400 0.150 9.200 0.965 

1.600 0.185 9.400 0.972 

1.800 0.215 9.600 0.975 

2.000 0.241 9.800 0.979 

2.200 0.285 10.000 0.981 

2.400 0.315 10.200 0.983 

2.600 0.370 10.400 0.986 

2.800 0.388 10.600 0.989 

3.000 0.421 10.800 0.990 
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definition of, 338 
errors and, 341-342, 341t 
examples for, 340 
exercises for, 348-349 
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approach to regression, 434-436, 
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exercises for, 522-526 
model for, 522 
procedure for, 513-515b 
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testing assumptions of, 517-522 
computer examples for, 543-554 
Minitab, 543-546 
SAS, 548-554 
SPSS, 546-547 
decomposition of, 511-512, 512f 
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Kruskal-Wallis test v., 631 
for Latin squares, 481 
multiple comparisons, 536-542, 
538t 
example for, 538-541 
exercises for, 541-542 
Tukey’s method, 537b, 538t 
for multiple linear regression model, 
449-450, 449t 


example for, 450, 450t 
projects for, 554-557 
in linear models, 556-557 
with missing observations, 556 
transformations, 554-555 
randomized complete block design, 
526-535, 528t, 529t 
computational procedure for, 
530-531b 
decomposition of, 527-529 
example for, 532-533 
exercises for, 534-535 
for single-factor experiments, 469 
summary for, 543 
t-test v., 501, 506-508, 536 
for two treatments, 501-509 
example for, 506-507 
exercises for, 50-509 
MSE and MST, 504-505 
procedure for, 505-506b 
SSE and SST, 502-504 
Analysis of variance F-test, 556-557 
Angular transformations, for ANOVA, 
555 
ANOVA. See Analysis of Variance 
Aperiodic state, 755 
Area sampling. See Cluster sampling 
ASEs. See Averaged squared errors 
Associative law, 749-750b 
Asymptotic properties, 285-286 
Atkinson, 487 
Average deviation, 30 
Averaged squared errors (ASEs), 287 
Axiomatic probability, 57-58, 57b 
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definition of, 13 

example for, 13, 13t, 14f 

SAS examples for, 48-50 
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Bayes’ rule 

application of, 77b 

definition of, 76-77, 562 

example for, 77-78 
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Credible intervals 
Bayesian decision theory, 588-595 
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optimal decision, 591 
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596-597 
procedure for, 589b 
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informative prior, 564-565, 565t 
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Bayesian point estimation (continued) 
exercises for, 577-579 
introduction to, 560-562 
posterior distribution, 563 
procedure for, 570, 570b, 571£ 

Bell-shaped curve 
example of, 32, 33f 
histogram and, 19-20 
normal probability distribution and, 

125, 126f 

Bernoulli population 
efficiency example for, 275-276 
unbiased estimators and, 247 

Bernoulli random variable 
law of large numbers for, 167 
method of moments and, 228-229 
mixture distribution and, 180-181 
probability function of, 115 
sufficient estimators and, 252-253 

Beta distribution, 136 

Between group sum of squares. See Sum 

of squares for treatment 

Bias 
definition of, 247 
jackknife method for, 659 
in loss function, 491-492 
in MSE, 250-251 
occurrence of, 702 

Biased estimators, 247 

Binomial experiment, 115 

Binomial probability distribution, 

114-119 
definition of, 101 
examples for, 116-118 
hypothesis testing with, 367-368 
normal approximation to, 213-216, 
214f 
continuity correction for, 214-215, 
214-215b, 215f 
example for, 215-216 
Poisson approximation to, 121, 
121b 
of random variable, 101 
recursive calculation of, 182 

Binomial random variable 
definition of, 101 
examples for, 116-118 
expected value of, 118-119, 118b 
mean of, 101-102, 118-119, 118b 
moment-generating function of, 

118-119, 118b 
SAS examples for, 178-180 
variance of, 118-119, 118b 


Binomial theorem, 116 
Birthday problem, 67-68, 112 
Bivariate data, modeling, 730-732, 
730f 
Blinding, 471 
Block, 471-472 
Blocking, 471-472 
Bonferroni procedure, 536 
Bootstrap confidence interval, 667-669 
procedure to find 
for mean, 667-668b, 668 
for median, 668b 
Bootstrap method 
computation of, 699-700 
confidence interval, 667-669 
description of, 663-664 
example for, 664 
jackknife method v., 664 
standard error and, 663, 665-666, 
665b, 666-667b 
example for, 666-667 
Box plots, 33-35, 34f 
for ANOVA, 518-522, 521f 
definition of, 33 
example for, 34-35, 35f 
tying it all together, 735-743, 736f, 
741f 
for hypothesis testing, 365-366, 
366f 
Minitab examples for, 44-45, 45f 
for outliers, 709 
procedure for construction of, 33b 
SAS examples for, 48-50 
side-by-side, 704 
Box-and-whisker plots. See Box plots 
BUGS. See Bayesian inference Using 
Gibbs Sampling 


C 


“Cannot reject,” with Minitab, 46 
Categorical data. See Qualitative data 
cdf. See Cumulative distribution function 
Census study, 8 
Center of data, with histogram, 19-20 
Central kth moment. See kth moment 
about its mean 
Central Limit Theorem (CLT), 125, 
168-171, 168b 
definition of, 168-169, 168b 
examples for, 169-171 
large sample approximations and, 
212-213 
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large sample confidence interval, 
300 
normal approximation to binomial 
distribution and, 214 
in statistics, 171 
Student t-distribution v., 198 
Chapman-Komogorov equation, 
753-754 
Chebyshey, Pafnuty, 164 
Chebyshev’s theorem, 164-165, 164b 
consistency test with, 267-268 
definition of, 164 
examples for, 165-166 
proof of, 164-165 
Chi-square distribution 
confidence interval for population 
variance and, 315-320, 316f 
definition of, 135, 135f 
examples for, 195-196 
probability, 197-198 
sample variance, 197 
exercises for, 204-207 
Friedman test and, 636 
example for, 637-638, 637t, 638t 
Neyman-Pearson lemma and, 
353-354 
sampling distributions associated 
with normal populations, 
192-198, 194f, 195f 
summary for, 198b 
Chi-square random variable 
degrees of freedom of, 193 
F-distribution and, 202 
from gamma random variable, 193 
mean, variance, and mef of, 136b 
from standard normal random 
variable, 193, 194f 
Chi-square tests 
for count data, 388-398 
exercises for, 397-398 
goodness of fit, 389 
multinomial distribution testing, 
390-392 
test for independence, 392-395 
testing to identify probability 
distribution, 395-397 
for goodness of fit, 395-397 
Kruskal-Wallis test and, 632 
for observed frequency, 389-390 
Claimed mean, in hypothesis testing, 
364 
Class boundaries, 17 
Class mark, 17 


Classical probability 
computing method for, 56, 56b 
definition of, 56b 
Clinical studies, 4 
CLT. See Central Limit Theorem 
Cluster sampling, 11 
Coefficient of determination, 422, 
461-462 
Combinations formula, 65-66b, 66 
example for, 66 
Commutative law, 749-750b 
Complement, 749, 749f 
Complement law, 749-750b 
Completely randomized design 
ANOVA, 510-526, 512f 
decomposition of, 510-512, 
512f 
example for, 518-522, 518t, 
519f, 520f, 521f 
exercises for, 522-526 
model for, 522 
procedure for, 513-515b 
p-value approach, 515-517, 
517t 
testing assumptions of, 517-522 
definition of, 474 
example for, 469 
Minitab example, 543-545 
SAS example, 548-549 
SPSS example, 546-547 
for two or more populations, 
513-515b 
Computer examples 
for ANOVA, 543-554 
Minitab, 543-546 
SAS, 548-554 
SPSS, 546-547 
for Bayesian computation, 596 
for descriptive statistics, 41-51 
Minitab, 41-46 
SAS, 47-50 
SPSS, 46-47 
for empirical methods, 698-699 
for experimental design, 494-497 
Minitab, 494 
SAS, 494-497 
for hypothesis testing, 399-408 
Minitab, 399-403 
SAS, 405-408 
SPSS, 403-405 
for interval estimation, 330-333 
Minitab, 330-332 
SAS, 333 


SPSS, 332 
for linear regression models, 
455-461 
Minitab, 455-456 
SAS, 458-461 
SPSS, 457-458 
for nonparametric tests, 642-652 
Minitab, 642-646 
SAS, 648-652 
SPSS, 646-648 
for point estimation, Minitab, 
283-285 
for probability theory, 108-111, 
175-180 
Minitab, 109-110, 175-177 
SAS, 110-111, 178-180 
SPSS, 110, 177 
for sampling distributions, 219-221 
Minitab, 219 
SAS, 219-221 
SPSS, 219 
Computers, statistics and, 39-40 
Conditional expectation 
definition of, 147 
example for, 148 
Conditional probability density function, 
Gibbs algorithm and, 692 
Conditional probability distribution 
definition of, 71, 144 
example for, 72-73, 144-146, 
145f 
exercises for, 78-83 
properties of, 72b 
Confidence coefficient, 292 
Confidence interval 
bootstrap method and, 663, 
667-669 
calculation of, 292-293 
example for, 293-295 
computer examples for, 330-333 
Minitab, 330-332 
SAS, 333 
SPSS, 332 
concerning two population 
parameters, 321-329 
difference of two means, 321-324, 
321b, 322b 
exercises for, 327-329 
for probability, 325-326, 
325b 
for variance, 326-327, 326b 
conducting a statistical test with, 
409-410 
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definition of, 292 
example for tying it all together, 
735-743 
exercises for, 298-300 
in hypothesis testing, 409-410 
for dependent samples, 384 
jackknife, 659 
large sample, 300-310 
for difference of two means, 
321-322, 321b 
example for, 300-303 
exercises for, 306-310 
margin of error and sample size, 
303-306 
Minitab examples for, 331 
procedure for calculation of, 
300b 
projects for, 334-335 
for proportion, 302-303, 325, 
325b 
nonparametric, 601-606, 602f 
example for, 603-605 
exercises for, 605-606 
median for, 602-603, 602f 
procedure for finding, for median, 
603b 
pivotal method for, 293-298, 295f, 
296f 
example for, 296-298 
procedure for, 296b 
for population variance, 315-320, 
316f 
examples for, 317-318 
exercises for, 318-320 
procedure for, 317b 
projects for, 334-336 
based on sampling distributions, 
334 
large sample confidence intervals, 
334-335 
prediction interval from normal 
population, 336 
simulation of coverage of small 
confidence intervals, 334 
for regression coefficients, 429b 
example for, 430-431 
for simple model for univariate data, 
727-729, 729f 
small sample, 310-315 
examples for, 311-312 
exercises for, 313-315 
procedure for, 310-311b, 311 
Conjugate prior, 567 


Consistency, 266-269, 266f 
definition of, 266 
examples for 
sample mean, 267 
sample variance and MLEs, 
268-269 
exercises for, 279-282 
test for, 267-268, 267b 
procedure for, 268b 
of unbiased estimator, 266b 
uniqueness and, 269 
Contingency, investigation of, 
392-395 


Contingency table, Minitab example for, 


402 
Continuous random variable 


cumulative distribution function for 


definition of, 86 


examples for, 87-90, 87f, 88f, 89f, 


90f 
properties of, 87b 
definition of, 86 
expected value of 
definition of, 93 
examples of, 93-94, 94f 


Metropolis algorithm for, 685-686b 


M-H algorithm for, 689-690, 690b 
Neyman-Pearson lemma and, 
350-352 


Poisson probability distribution and, 


122 
posterior distribution for, 567-569 
probability density function for 
definition of, 86 
examples, 87-90, 87f, 88f, 89f, 
90f 
simulations of, 221-222 
Control plot, for Taguchi methods, 
489-490, 490f, 491f 
Correlation 
definition of, 149, 441 
probability distribution of, 442 
Correlation analysis, 440-444 
definition of, 441 
exercises for, 444 
Correlation coefficient 
definition of, 149, 441 
hypothesis test for, 442b 
example for, 443 
linear regression model and, 
441-442 
properties of, 149b 


Count data, chi-square tests for, 
388-398 
exercises for, 397-398 
goodness of fit, 389 
multinomial distribution testing, 
390-392 
test for independence, 392-395 
testing to identify probability 
distribution, 395-397 
Countable, 750 
Countably infinite, 750 
Counting random variable, 119 
Counting techniques, probability 
calculation and, 63-71 
exercises for, 69-71 
Coupon collector's problem, 181 
Covariance 
definition of, 148 
example for, 149-150 
properties of, 149b 
Cramér-Rao inequality, 273b, 274 


Cramér-Rao procedure, to test efficiency, 


274b 
example for, 274-275 
Credible intervals, 579-584, 580f 
definition of, 579-580 
examples for, 580-581, 581f 
with HPD, 583 
exercises for, 583-584 
highest posterior density, 582 
procedure for, 581, 582b 
Cross-sectional data 
definition of, 5 
example for, 6, 6t, 7t 
Cumulative distribution function (cdf) 
for continuous random variables 
definition of, 86 


examples for, 87-90, 87f, 88f, 89f, 


90f 
properties of, 87b 
for discrete random variables 
definition of, 84 
examples for, 85-86, 85f, 86f 
find pdf with, 155 
find with Poisson probability 
distribution, 156 


method of distribution functions for, 


155b, 156 
Minitab examples for, 175-177 
Cumulative relative frequency 
definition of, 17 
example of, 17-18 
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D 
Data 
graphical representation of, 13-26 
bar graph, 13-14, 13t, 14f 
box plots, 704 
dotplot, 703 
exercises for, 20-26, 707-708 
Pareto graph, 14, 15f 
pie chart, 15, 15t, 16f 
quantile quantile plot, 705-706 
scatterplot, 704-705, 704f, 705f 
stem-and-leaf plot, 16-17, 16t 
numerical description of, 26-39 
box plots, 33-35 
exercises for, 35-39 
grouped data, 30-33 
Data collection, 3 
general procedures for, 3b 
Data types, in descriptive statistics, 
5-8 
de Méré, Chevalier, 54 
de Moivre, Abraham, 125, 183-184 
De Morgan’s laws, 749-750b 
Degrees of freedom, 192 
Density function, of normal probability 
distribution, 125 
Denumerable, 750 
Dependency, investigation of, 392-395 
Dependent event 
definition of, 74 
example for, 74 
Dependent samples, hypothesis testing 
for, 382-385 
confidence interval, 384 
exercises for, 385-388 
matched pairs, 382, 383-384, 383b 
Dependent variable, in regression 
analysis, 412 
Descriptive statistics, 1-51 
basic concepts of, 3-8 
data types, 5-8 
exercises for, 7-8 
computer examples for, 41-51 
exercises for, 50-51 
Minitab, 41-46 
SAS, 47-50 
SPSS, 46-47 
computers and statistics, 39-40 
definition of, 2,5 
graphical representation of data, 
13-26 
bar graph, 13-14, 13t, 14f 


exercises for, 20-26 
Pareto graph, 14, 15f 
pie chart, 15, 15t, 16f 
stem-and-leaf plot, 16-17, 16t 
introduction to, 2-3 
data collection, 3 
numerical description of data, 26-39 
box plots, 33-35 
exercises for, 35-39 
grouped data, 30-33 
probability theory and, 54 
projects for, 51 
sampling schemes, 8-12 
errors in, 11-12 
exercises for, 12 
sample size, 12 
summary of, 40-41 
Design of experiment (DOE), 465-497 
completely randomized design 
definition of, 474 
example for, 469 
computer examples for, 494-497 
Minitab, 494 
SAS, 494-497 
concepts from, 467-483 
exercises for, 482-483 
fundamental principles, 471-474 
specific designs, 474-481 
terminology, 467-471 
elements of, 467 
factorial design, 483-487 
definition of, 483 
exercises for, 486-487 
fractional, 486 
full, 485-486 
one-factor-at-a-time design, 
483-485, 485f 
Greco-Latin squares, 481, 481t 
introduction to, 466-467 
Latin square design 
definition of, 477 
example for, 478, 478t 
history of, 477-478 
procedure for constructing, 478b, 
479-480, 479t, 480t 
observational study v., 468 
optimal design, 487-489 
sample size selection, 487-489 
projects for, 497 
randomized complete block design 
definition of, 474 
examples for, 475 


procedure for, 474-475b 
with replications, 475-476b, 
476-477 
summary for, 493-494 
Taguchi methods, 489-493, 490f, 
491f 
exercises for, 492-493 
variance in, 470-471 
Design parameters, 492 
Difference 
of set, 749 
symmetric, 749 
Discrete distribution, sufficient statistic 
for, 259-260 
Discrete random variable 
cumulative distribution function for, 
84 
definition of, 84 
example for, 84-85 
expected value of 
definition of, 92 
examples of, 93-94, 94f, 96 
Metropolis algorithm for, 685b 
M-H algorithm for, 689b 
Poisson probability distribution and, 
120 
probability mass function for, 84 
uniform distribution of, 96 
Disjoint. See Mutually exclusive 
Distribution functions. See Cumulative 
distribution function; Probability 
distribution functions 
Distributional model, histogram and, 
19-20 
Distributive law, 749-750b 
DOE. See Design of experiment 
Dominant trait, 73 
Dotplot 
example of, 703 
tying it all together, 735-743, 736f, 
741 
for normality, 714-715, 714f 
for simple model for univariate data, 
727-729, 729£ 
use of, 703 
Double-blind, 471 


E 
Efficiency, 270-277 
Cramér-Rao inequality, 273b, 274 
Cramér-Rao procedure to test, 274b 
example for, 274-275 
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definition of, 270 
efficient estimator, 274, 274b 
examples for 
Bernoulli population, 275-276 
Poisson distribution, 275 
with sample mean and variance, 
270-272 
exercises for, 279-282 
relative 
definition of, 272 
example for, 272-273 
relative test for, 270b 
uniformly minimum variance 
unbiased estimator, 273 
Efficient estimator, 274, 274b 
Efron, Bradley, 663 
80/20 rule. See Pareto effect 
Elements, 747 
EM algorithm. See Expectation 
maximization algorithm 
Empirical distribution function, 288-289 
Empirical mean. See Sample mean 
Empirical methods, 657-700 
bootstrap methods, 663-669 
confidence interval, 667-669 
description of, 663-664 
example for, 664 
jackknife method v., 664 
standard error and, 663, 665-667, 
665b, 666-667b 
computer examples for, 698-699 
expectation maximization algorithm, 
669-681 
examples for, 673-679 
exercises for, 680-681 
log-likelihood function and, 680 
overview of, 670-671 
steps of, 671-673, 671b 
use of, 669-670 
introduction to, 658 
jackknife method, 658-663 
exercises for, 661-663 
history of, 658 
procedure for point and interval 
estimation, 660-661, 660b, 660t, 
66lt 
use of, 658-659 
Markov chain Monte Carlo, 681-697 
algorithms for, 682 
in Bayesian analysis, 682 
with Bayesian estimation, 562 
construction of, 683-685 
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Empirical methods (continued) 
exercises for, 696-697 
Gibbs algorithm, 682, 692-695, 
693f, 695b 
issues in, 695-696 
Metropolis algorithm, 682, 
685-686b, 685-688, 689f 
Metropolis-Hastings algorithm, 
682, 688-692, 689b, 690b 
Monte Carlo integration, 682, 683b 
objective of, 682 
references for, 696 
projects for, 699-700 
summary for, 697-698 
Empirical rule, 32b 
Enumerable, 750 
Ergodic 
definition of, 755 
Markov chain, 755-756 
Ergodic theorem, 756 
Erland distribution, 131 
Error probabilities 
example for, 342-344 
statistical decision and, 340-342, 
341t 
Error variance, in linear regression 
models, estimation of, 425 
Errors 
in hypothesis testing, 340-342, 
341t 
normality of, 453 
in sample data, 11-12 
E-step. See Expectation step 
Estimates, 227 
Estimators. See also Maximum likelihood 
estimators 
biased, 247 
definition of, 227 
least-squares 
derivation of, 416-421, 420f 
Gauss—Markov theorem, 424b 
inferences on, 428-437, 435t 
for multiple linear regression 
model, 446 
properties of, 422-425 
method of maximum likelihood for, 
235-246 
exercises for, 244-246 
likelihood function in, 235-236 
maximum likelihood estimators, 
236 
method of moments for, 227-235 
definition of, 228b 


exercises for, 233-235 
generalized, 233 
Poisson distribution, 232-233 
population parameters, 228-230 
population probability density 
function, 231-232 
procedure for, 228b 
for sample mean and variance, 
230-231 
properties of, 246-282 
unbiased estimators, 247-252 
sufficiency, 252 


Euler, Leonhard, 477-478 
Expectation maximization (EM) 


algorithm, 669-681 
example for 
censored survival times, 673-676 
normal sample, 677-678 
unknown variables, 678-679 
exercises for, 680-681 
log-likelihood function and, 680 
overview of, 670-671 
steps of, 671-673, 671b 
use of, 669-670 


Expectation step (E-step), of EM 


algorithm, 671-673 


Expected frequency, 389 
Expected value 


of binomial random variable, 
118-119, 118b 
of continuous random variables 
definition of, 93 
examples of, 93-94, 94f 
of discrete random variables 
definition of, 92 
examples for, 93-94, 94f, 96-98 
with joint probability function, 146 
example for, 147 
MCMC and, 683 
with median test, 621 
Minitab examples for, 109-110 
properties of, 95b, 146b 
of sample variance, 188-189 
SAS examples for, 110-111 
SPSS examples for, 110 
of uniform random variable, 123b, 
124 


Experiment 


binomial, 115 
definition of, 55 


Experimental design, 465-497 


completely randomized design 
definition of, 474 


example for, 469 
computer examples for, 494-497 
Minitab, 494 
SAS, 494-497 
concepts from, 467-483 
exercises for, 482-483 
fundamental principles, 471-474 
specific designs, 474-481 
terminology, 467-471 
elements of, 467 
factorial design, 483-487 
definition of, 483 
exercises for, 486-487 
fractional, 486 
full, 485-486 
one-factor-at-a-time design, 
483-485, 485f 
Greco-Latin squares, 481, 481t 
introduction to, 466-467 
Latin square design 
definition of, 477 
example for, 478, 478t 
history of, 477-478 
procedure for constructing, 478b, 
479-480, 479t, 480t 
observational study v., 468 
optimal design, 487-489 
sample size selection, 487-489 
projects for, 497 
randomized complete block design 
definition of, 474 
examples for, 475 
procedure for, 474-475b 
with replications, 475-476b, 
476-477 
single-factor and multifactor, 
469-470 
summary for, 493-494 
Taguchi methods, 489-493, 490f, 
491f 
exercises for, 492-493 
variance in, 470-471 


Experimental error 


analysis of variance for, 470-471 
definition of, 470 


Experimental units 


definition of, 468 
example for, 469 


Explanatory variable. See Independent 


variable 


Exponential probability distribution 


definition of, 133, 134f 
examples for, 134-135 


generating samples from, 181 

random variable simulation with, 221 

SAS example for, 220-221 
Exponential random variables 

definition of, 133 

mean, variance, and mef of, 134b 


F 


Factor levels 
definition of, 468 
examples for, 469-470 
Factorial design, 483-487 
definition of, 483 
exercises for, 486-487 
fractional, 486 
full, 485-486 
one-factor-at-a-time design, 
483-485, 485f 
definition of, 483-484 
example for, 484, 485f 
Factorization theorem, for joint 
sufficiency, 258b 
examples for, 259-260 
Factors. See also Independent variable 
definition of, 467-468 
examples for, 469-470 
F-distribution, 202-204, 202f, 203f 
definition of, 202 
example for, 204 
exercises for, 204-207 
regression analysis and, 434 
theorem for, 203-204 
Fermant, Pierre, 54 
Finance, 4 
Finite population correction factor, 187 
Finite population, in sampling 
distributions, 187-189 
Finite set, 747 
Fisher information, 276-277 
Fisher, Sir Ronald, 1-2, 235, 478, 500 
Fisher z-transform, 442 
Fractional factorial design, 486 
Frequency interpretation of probability, 
67 
examples for, 67-69 
Frequency probability, 57b 
Frequency table 
creation of, 17 
definition of, 17 
guidelines for construction of, 
18-19b 
SAS examples for, 48-50 


Friedman test, 634-638 

chi-square distribution and, 636 
example for, 637-638, 637t, 638t 

example for, 635-636, 635t, 636t 
Minitab example for, 645-646 
procedure for, 634-635b 

F-test, analysis of variance, 556-557 

Full conditionals, 693 

Full factorial design, 485-486 


G 
Galton, Sir Francis, 411-412 
Gamma density, 131 
Gamma function, 131 
Gamma probability distribution, 
131-136, 132f, 134f, 135f 
definition of, 131 
examples of, 132-133 
maximum likelihood estimators 
with, 242-243 
method of moments and, 229-230 
plotting of, 131, 132f 
Gamma random variable 
chi-square random variable from, 
193 
mean, variance, and mef of, 132b 
Gauss, Carl Friedrich, 54, 113-114, 
125 
Gaussian distribution, 125 
central limit theorem and, 169 
Gauss-Markov theorem, for least-squares 
estimators, 424b 
Generalized method of moments 
(GMM), 233 
Genetics, probability and statistics in, 
73-74 
example for, 73-74, 74t 
Hardy-Weinberg Law, 112, 116-117 
Geometric distribution, maximum 
likelihood estimators with, 
237-238 
Gibbs algorithm 
assumption for, 692-693, 693f 
example for, 693-694 
for MCMC, 682, 692-695, 693f 
summary of, 695b 
Gibbs sampler. See Gibbs algorithm 
GM. See Grand mean 
GMM. See Generalized method of 
moments 
Goodness of fit 
for ANOVA, 517 
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chi-square test, 395-397 
definition of, 389 
example for, 389, 396-397 
probability distributions, 395b 
test for, 390-392 
examples for, 390-392 
summary of, 390b 
Grand mean (GM), in ANOVA, 511-512, 
512f 
Graphical representation, 13-26, 
702-708, 704f, 706f 
bar graph, 13-14, 13t, 14f 
box plots, 704 
dotplot, 703 
exercises for, 20-26, 707-708 
Pareto graph, 14, 15f 
pie chart, 15, 15t, 16f 
quantile quantile plot, 705-706 
example of, 706 
scatterplot, 704, 704f 
example for, 704-705, 705f 
stem-and-leaf plot, 16-17, 16t 
Greco-Latin squares, 481, 481t 
Green revolution, 500 
Grouped data 
definition of, 17 
mean of, 30-31, 30t, 31t 
numerical measures for, 30-33 
variance of, 30 


H 
Hardy-Weinberg Law, 112, 116 
example for, 116-117 
Heteroscedastic errors, 431 
High leverage points, in linear regression 
models, 462-463 
Highest posterior density (HPD), 582 
example for, 583 
Histogram 
data transformation and, 717-719, 
718f, 719 
definition of, 18 
dotplot v., 703 
empirical rule and, 32b 
example of, 19, 20f 
guidelines for construction of, 
18-19b 
Minitab examples for, 43, 43f, 219 
for normality, 714-715, 714f 
SPSS examples for, 46 
use of, 18 
Homogeneous Markov chains, 752 


Homoscedastic errors, 431 
Homoscedasticity, least-squares 
regression model and, 452, 452f 
HPD. See Highest posterior density 
Hypothesis testing, 337-410. See also 
Nonparametric hypothesis test 
Bayesian, 584-588 
example for, 586 
exercises for, 587-588 
Jeffreys’ hypothesis testing criterion, 
585-586 
odds ratio, 585 
procedure for, 587b 
chi-square tests for count data, 
388-398 
exercises for, 397-398 
goodness of fit, 389 
multinomial distribution testing, 
390-392 
test for independence, 392-395 
testing to identify probability 
distribution, 395-397 
computer examples for, 399-408 
Minitab, 399-403 
SAS, 405-408 
SPSS, 403-405 
confidence interval for, 409-410 
for correlation coefficient, 442b 
example for, 443 
error probabilities in, 340-342, 
341t 
examples for, 340 
error probabilities, 342-346 
exercises for, 348-349 
general method for, 339b 
introduction to, 338-349 
sample size, 346-348 
likelihood ratio tests, 355-361 
definition of, 357b 
examples for, 357-360 
exercises for, 360-361 
procedure for, 359b 
UMP tests, 355-356 
Neyman-Pearson lemma, 349-355 
example for, 352-353 
exercises for, 355 
procedure for applying, 353b 
theorem for, 350-352 
nonparametric 
for multiple samples, 630-640 
for one sample, 606-620 
for two samples, 620-630 
projects for, 408-410 


conducting a statistical test with 
confidence interval, 409-410 
testing on computer-generated 
samples, 408-409 
for regression coefficients, 431b, 
432b 
example for, 432-433 
for single parameter, 361-372 
examples for, 365-368, 366f 
exercises for, 370-372 
large sample, 368b 
nonparametric, 606-620 
p-value, 361-363 
summary of, 364-365b 
testing, 363-372 
variance, 368-369b 
statistical hypothesis, 338-339, 339b 
statistical inference and, 561 
steps in any, 363b 
summary for, 399 
for two samples, 372-388 
dependent samples, 382-385 
equality of variances, 380-382, 
381b 
exercises for, 385-388 
independent samples, 373-382 
large sample hypothesis testing, 
373-374b, 374 
nonparametric, 620-630 
small sample of two population 
means, 375-379, 375b 
for two proportions, 379-380b, 
379-381 
Wilcoxon signed rank test procedure, 
611-612b 


Idempotent law, 749-750b 
Identically distributed, 184 
Identity law, 749-750b 
Impossible event, 55 
Independence sampler, 688 
Independent event, 74 
Independent random variables 
distribution of, 160-161, 161f 
examples for, 144-146, 145f 
pdf and, 144 
in Student t-distribution, 200 
Independent samples 
hypothesis testing for, 373-382 
equality of variances, 380-382, 
381b 
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example for, 394-395 
exercises for, 385-388 
large sample hypothesis testing, 
373-374b, 374 
for large samples, 373-374b 
small sample population means, 
375-379, 375b 
of two factors, 392-395 
for two proportions, 379-380b, 
379-381 
test for, 724 
Independent variable 
definition of, 467 
example for, 469 
in regression analysis, 412 
Inferential statistics 
definition of, 5 
probability theory and, 54 
Infinite set, 747 
Influential observations, least-squares 
regression model and, 453 
Informal probability, 55b 
Informative priors, in Bayesian point 
estimation, example of, 564-565, 
565t 
Input variable. See Independent variable 
Interquartile range (IQR) 
definition of, 27 
example for, 28-29 
Intersection, 748, 749f 
Interval estimation, 291-336 
computer examples for, 330-333 
Minitab, 330-332 
SAS, 333 
SPSS, 332 
concerning two population 
parameters, 321-329 
difference of two means, 321-324, 
321b, 322b 
exercises for, 327-329 
for probability, 325-326, 325b 
for variance, 326-327, 326b 
confidence interval 
calculation of, 292-293 
definition of, 292 
definition of, 292 
introduction to, 292-300 
exercises for, 298-300 
jackknife method procedure for, 660b 
example for, 660-661, 660t, 661t 
large sample confidence interval, 
300-310 
example for, 300-303 


exercises for, 306-310 
margin of error and sample size, 
303-306 
procedure for calculation of, 300b 
for proportion, 302-303, 325, 325b 
for population variance, 315-320, 
316f 
examples for, 317-318 
exercises for, 318-320 
procedure for, 317b 
projects for, 334-336 
based on sampling distributions, 
334 
large sample confidence intervals, 
334-335 
prediction interval from normal 
population, 336 
simulation of coverage of small 
confidence intervals, 334 
small sample confidence intervals, 
310-315 
examples for, 311-312 
exercises for, 313-315 
procedure for, 310-311b, 311 
statistical inference and, 561 
summary for, 330 
Interval estimator 
definition of, 292 
purpose of, 292 
Invariance property, of maximum 
likelihood estimators, 243 
IQR. See Interquartile range 
Irreducible Markov chain, 755 


J 


Jackknife confidence interval, 659 
Jackknife estimate, 659 
Jackknife method, 658-663 
bootstrap method v., 664 
exercises for, 661-663 
history of, 658 
procedure for point and interval 
estimation, 660b 
example for, 660-661, 660t, 661t 
use of, 658-659 
Jacobian of transformation, 159 
Joint probability density function, 159 
of order statistics, 208, 210 
Joint probability distributions, 141-154, 
145f 
conditional expectation, 147-148 
covariance and correlation, 148-150 


definition of, 141 
exercises for, 150-154 
expected value, 146-147, 146b 
independent random variables, 
144-146, 145f 
marginal pmf, 143-144 
MLE with, 244 
Joint probability function 
with Bayes theorem, 562 
example for, 574-575 
definition of, 141 
examples for, 142 
expected value with, 146 
example for, 147 
Jointly sufficient 
definition of, 257 
examples for, 258-260 
factorization criteria for, 258b 


K 
Khintchine, A., 167 
Kiefer, J., 487 
Komogorov, Andrei, 54 
Kruskal-Wallis test, 631-634 
for ANOVA, 518 
chi-square approximation, 632 
example for, 632-634, 633t 
Friedman test v., 634 
Minitab example for, 644-645 
procedure for, 631-632b 
SAS example for, 650-652 
SPSS example for, 647-648 
theorem of, 632 
kth moment about its mean, 99 
kth moment about the origin 
definition of, 99 
in method of moments, 227-228 
kth order statistic 
definition of, 207-208 
probability density function of, 208 
example for, 209 
Kurtosis, 98-105 
definition of, 99 


L 


Laboratory experiments, 4 
Laplace, Pierre, 54, 125 
Large sample approximations, 212-218 
exercises for, 216-218 
normal approximation to binomial 
distribution, 213-216, 214f, 215f 
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Large sample confidence interval, 
300-310 
for difference of two means, 
321-322, 321b 
example for, 300-302 
exercises for, 306-310 
margin of error and sample size, 
303-306 
examples for, 305-306 
Minitab examples for, 331 
procedure for calculation of, 300b 
projects for, 334-335 
for proportion, 302-303, 325b 
example for, 303, 325 
Large sample hypothesis testing, 
364-365b 
independent samples, 373-374b 
example for, 374 
median test, 622-623b 
sign test, 610b 
Wilcoxon rank sum test, 627-628b 
example for, 628-629, 628t 
Wilcoxon signed rank test, 615b 
Latin square design 
ANOVA for, 481 
definition of, 477 
example for, 478, 478t 
history of, 477-478 
procedure for constructing, 478b 
example for, 479-480, 479t, 480t 
Law of large numbers, 166-167b, 
166-168 
for Bernoulli random variable, 167 
definition of, 166-167b 
example for, 167-168 
proof of, 167 
Law of total probability, 75b 
example for, 75-76 
Laws of probability, 2 
Least-squares estimators 
derivation of, 416-421, 420f 
Gauss-Markov theorem, 424b 
inferences on, 428-437 
ANOVA approach to, 434-436, 
435t 
exercises for, 436-437 
for multiple linear regression model, 
446 
properties of, 422-425 
Least-squares line 
definition of, 416 
procedure for fitting, 418b 
example for, 419-420, 420f 


Least-squares regression model 
error independence, 453 
example for, 739-743, 743f 
homoscedasticity and, 452, 452f 
linearity and, 452 
normality of errors, 453 

“Leave-one-out” method, 661 

Leptokurtic, 99 

Level, in experimental design, 468 

Level of significance, 341 

Levene’s test, ANOVA and, 518, 722 

Likelihood function 
Bayesian inference and, 561 
Bayesian point estimation and, 

562-563, 565, 576-577, 577f 
example for, 564-565, 565t 
definition of, 235-236 
EM algorithm and, 670-671 
example for, 236 
likelihood ratio and, 356 
for uniform probability distribution, 
24f, 240 
Likelihood ratio, 355-361 
definition of, 356 
LRTs, 357b 
Likelihood ratio tests (LRTs), 355-361 
definition of, 357b 
examples for, 357-360 
exercises for, 360-361 
procedure for, 359b 
UMP tests, 355-356 
Lilliefors test, 222 
Limit theorems, 163-173 
central limit theorem, 168-171, 
168b 

Chebyshev’s theorem, 164-166, 
164b 

exercises for, 171-173 

law of large numbers, 166-167b, 
166-168 

Linear regression models, 411-463 
ANOVA in, 556-557 
computer examples for, 455-461 

Minitab, 455-456 
SAS, 458-461 
SPSS, 457-458 
correlation analysis, 440-444 
exercises for, 444 
inferences on least-squares 
estimators, 428-437 
ANOVA approach to, 434-436, 
435t 
exercises for, 436-437 


introduction to, 412-413 
matrix notation for, 445-451 
exercises for, 450-451 
multiple linear regression model 
ANOVA for, 449-450, 449t, 450t 
definition of, 414 
exercises for, 450-451 
least-squares estimators for, 446 
matrix examples for, 447-448 
model for, 445 
procedure to obtain equation, 
447b 
sum of squares for errors for, 
446-447 
predicting a particular value of Y, 
437-440 
example for, 439 
exercises for, 440 
projects for, 461-463 
coefficient of determination, 
461-462 
outliers and high leverage points, 
462-463 
scatterplots for checking adequacy, 
461 
regression diagnostics, 451-453 
simple, 413-428, 413f, 414f 
derivation of estimators, 416-421, 
420f 
estimation of error variance, 425 
exercises for, 425-428 
least-squares estimator properties, 


422-425 

method of least squares, 415-416, 
415f 

quality of regression, 421-422, 
421f, 422f 


summary for, 454 
Linearity, least-squares regression model 
and, 452 
Logarithmic transformations, for 
ANOVA, 555 
Log-likelihood function, 237 
EM algorithm and, 680 
Log-normal distribution, 129-130 
examples for, 130-131 
Loss function, 489-491, 490f 
for Bayesian estimate, 569-570 
bias and variance in, 491-492 
quadratic, 491, 491f 
Loss, in Bayesian decision theory, 589 
Lower quartile 
definition of, 27 
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example for, 28-29 
LRTs. See Likelihood ratio tests 


M 


Maclaurin’s expansion, with Poisson 
random variable, 120 
Margin of error 
definition of, 303 
large sample confidence interval and, 
303-306 
examples for, 305-306 
Marginal probability density function 
with Bayes theorem, 562 
definition of, 143 
examples for, 143-146, 143t, 145f 
Marginal probability mass function 
definition of, 143 
examples for, 144-146, 145f 
Markov, A.A., 751 
Markov chain Monte Carlo (MCMC), 
681-697 
algorithms for, 682 
in Bayesian analysis, 682 
in Bayesian estimation, 562 
Chapman-Komogorov equation, 
753-754 
construction of, 683-685 
exercises for, 696-697 
Gibbs algorithm, 692-695, 693f 
assumption for, 692-693, 693f 
example for, 693-694 
for MCMC, 682, 692-695, 693f 
summary of, 695b 
issues in, 695-696 
Metropolis algorithm, 682, 685-688 
for continuous distribution, 
685-686b 
for discrete distribution, 685b 
example for, 686-688 
in MCMC, 682, 685-688 
target distribution from, 688, 689f 
Metropolis—Hastings algorithm, 682, 
688-692 
continuous case, 689-690, 690b 
discrete case, 689b 
example for, 690-692 
generalizations of, 690 
in MCMC, 682, 688-692 
use of, 688 
Monte Carlo integration, 682, 683b 
objective of, 682 
random walk chain, 753-754 


references for, 696 
review for, 751-756 
transition matrices for, 752 
examples for, 752-755 
transition probabilities for, 751 
example for, 753-754 
Masuyama, Motosaburo, 465 
Matched pairs test 
hypothesis testing and, 382, 
383-384, 383b 
two independent sample test v., 
384-385 
Mathematical expectation. See Expected 
value 
Mathematical statistics, 2 
MATLAB, for statistics analysis, 39 
Matrix notation, for linear regression, 
445-451 
Maximization step (M-step), of EM 
algorithm, 671-673 
Maximum likelihood equations, 240 
bootstrap method and, 664 
example for, 240-242 
Maximum likelihood estimators (MLEs) 
Bayesian inference and, 564 
example for, 564-565, 565t 
consistency of, 268-269 
definition of, 236 
EM algorithm and, 670-671 
examples for 
with gamma distribution, 242-243 
with geometric distribution, 
237-238 
with maximum likelihood 
equations, 240-242 


with Poisson distribution, 238-239 


with random sample, 239-240, 
240f 
invariance property of, 243 
large sample confidence interval and, 
302-303 
likelihood ratio and, 356 
method for, 237, 237b 
Minitab example for, 284-285 
sufficient statistic and, 260 
example for, 261 
unbiased estimators and, 252 
MCMC. See Markov chain Monte Carlo 
Mean 
alternate method of estimating, 287 
Bayesian point estimation, 575-576 
of binomial random variable, 
101-102, 118-119, 118b 


bootstrap confidence interval 
procedure to find, 667-668b 
example for, 668 
of chi-square distribution, 192 
of chi-square random variables, 
136b 
definition of, 26 
example for, 28 
of exponential random variables, 
134b 
of gamma random variable, 132b 
grouped 
definition of, 30 
example for, 30-31, 30t, 31t 
large sample confidence interval for 
difference of two, 321-322, 321b 
Minitab examples for, 43-44 
for nonparametric tests, 600-601, 
601f 
of normal random variable, 126b 
of Poisson random variable, 120, 
120b 
sample, 185 
SAS example for, 220-221 
small sample confidence interval for 
difference of two, 322b, 323-324 
SPSS examples for, 46-47 
statistical inference and, 561 
of Student ¢-distribution, 199-200 
sufficiency of, 256-259 
of uniform random variable, 123b, 
124 
Mean square block (MSB), ANOVA and 
randomized complete block 
design, 529-535 
Mean square error (MSE) 
ANOVA and completely randomized 
design, 505-506b, 512-513 
example for, 518-522 
ANOVA and randomized complete 
block design, 529-535 
definition of, 250, 428, 504 
loss function and, 491 
null hypothesis and, 505 
Mean square treatment (MST) 
ANOVA and completely randomized 
design, 505-506b, 513 
example for, 518-522 
ANOVA and randomized complete 
block design, 529-535 
definition of, 505 
null hypothesis and, 505 
Median 
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bootstrap confidence interval 
procedure to find, 668b 
definition of, 27 
example for, 28-29 
grouped 
definition of, 31 
example for, 32, 32t 
Minitab examples for, 43-44 
for nonparametric tests, 600-602, 
6O1f 
in order statistics, 208 
sample, 185 
SPSS examples for, 46-47 
Median test, 620-625, 622t, 624t 
large sample, 622-623b 
example for, 623-624, 623t, 
624t 
Minitab example for, 643-644 
procedure for, 621b 
Members, 747 
Mendel, Gregor, 73 
Mesokurtic, 99 
Method of distribution functions, 
154-156, 158 
find cdf with, 155b, 156 
Method of least squares, for linear 
regression models, 415-416, 
415f 
Method of maximum likelihood, 
235-246 
exercises for, 244-246 
likelihood function in, 235-236 
example for, 236 
maximum likelihood estimators, 
236 
examples for, 237-243, 240f 
method for, 237, 237b 
Method of moments, 227-235 
definition of, 228b 
examples for 
for mean and variance, 230-231 
Poisson distribution, 232-233 
for population parameters, 
228-230 
population probability density 
function, 231-232 
exercises for, 233-235 
generalized, 233 
maximum likelihood estimators 
with, 240 
procedure for, 228b 
unbiased estimators and, 250 
uniqueness of, 232 
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Metropolis algorithm 
for continuous distribution, 
685-686b 
for discrete distribution, 685b 
example for, 686-688 
in MCMC, 682, 685-688 
target distribution from, 688, 689f 
Metropolis—Hastings (M-H) algorithm 
continuous case, 689-690, 690b 
discrete case, 689b 
example for, 690-692 
generalizations of, 690 
Gibbs algorithm and, 694 
in MCMC, 682, 688-692 
use of, 688 
mef. See Moment-generating function 
M-H algorithm. See Metropolis—Hastings 
algorithm 
Microsoft Excel, for statistics analysis, 
39 
Milk, temperature and spoilage of, 
497 
Minimal sufficiency, 277-279 
definition of, 277 
examples for, 277-279 
exercises for, 279-282 
Minimum variance unbiased estimator 
(MVUE) 
definition of, 251 
example for, 279 
Minitab 
ANOVA examples, 543-546 
completely randomized design, 
543-545 
randomized complete block design, 
545-546 
Tukey’s method, 546 
Bayesian computation examples, 596 
descriptive statistics examples, 41-46 
box plots, 44-45, 45f 
histogram, 43, 43f 
stem-and-leaf, 42-43 
test of randomness, 45-46 
empirical method examples, 
698-699 
experimental design examples, 
494 
hypothesis testing examples, 
399-403 
interval estimation examples, 
330-332 
large sample, 331 
small sample, 330-331 


linear regression model examples, 
455-456 
nonparametric tests examples, 
642-646 
Friedman test, 645-646 
Kruskal-Wallis test, 644-645 
median test, 643-644 
sign test, 642 
Wilcoxon signed rank test, 643 
point estimation examples, 283-285 
probability theory examples, 
109-110, 175-177 
randomness test examples, 45-46, 
654-655 
resources for, 41-42 
sampling distribution examples, 219 
for statistics analysis, 39 
Mixture distribution, 180-181 
MLEs. See Maximum likelihood 
estimators 
Mode 
definition of, 28 
example for, 28 
SPSS examples for, 46-47 
Model building, 727-733 
bivariate data, 730-732, 730f 
example for, 730-732, 731f, 732f 
exercises for, 732-733 
simple model for univariate data, 
727-729 
example for, 728-729, 729f 
in statistics, 3 
Modified z-score test, for outliers, 709 
Moment-generating function (mgf), 
92-107 
of Bernoulli random variable, 115 
of binomial random variable, 
118-119, 118b 
of chi-square random variables, 
136b 
definition of, 100 
joint distribution, 150 
examples for, 101-105 
of exponential random variables, 
134b 
of gamma random variable, 132b 
of normal random variable, 126b, 
191 
of Poisson random variable, 120, 
120b 
properties of, 104b 
of uniform random variable, 123b, 
124 


Moments, 92-107 
Monte Carlo integration, 682, 683b 
More efficient estimator, 272 
Most powerful test, 350 
MSB. See Mean square block 
MSE. See Mean square error 
MST. See Mean square treatment 
M-step. See Maximization step 
Multifactor experiments 
definition of, 469 
example for, 469-470 
Multinomial coefficients, 67, 67b 
Multinomial distribution, testing 
parameters of, 390-392 
examples for, 390-392 
summary of, 390b 
Multiphase sampling, 11 
Multiple comparisons, with ANOVA, 
536-542, 538t 
example for, 538-541 
exercises for, 541-542 
Tukey's method, 537b, 538t 
Multiple linear regression model 
ANOVA for, 449-450, 449t 
example for, 450, 450t 
definition of, 414 
exercises for, 450-451 
least-squares estimators for, 446 
matrix examples for, 447-448 
model for, 445 
procedure to obtain equation, 447b 
sum of squares for errors for, 
446-447 
Multiple mode presence, with histogram, 
19-20 
Multiplication principle, 64b 
example for, 64 
Multivariate, 40 
Mutually exclusive, 55 
Mutually independent, 74 
MVUE. See Minimum variance unbiased 
estimator 


Negatively correlated, 441-442 

Newton-Raphson in one dimension, 
288 

Neyman, Jerzy, 337-338 

Neyman-Fisher factorization criteria, 
254-256, 254b 

Neyman-Pearson lemma, 349-355 

example for, 352-353 


chi-square test, 353-354 
exercises for, 355 
likelihood ratio and, 356 
likelihood ratio test and, 358 
procedure for applying, 353b 
theorem for, 350-352 
Nightingale, Florence, 701-702 
Noise, 468 
Nomial data, 5 
Noninformative priors, in Bayesian point 
estimation, 565 
example of, 566, 566t 
Nonparametric confidence interval, 
601-606, 602f 
exercises for, 605-606 
median for, 602-603, 602f 
example for, 603-605 
procedure for finding, 603b 
Nonparametric hypothesis test 
for multiple samples, 630-640 
exercises for, 638-640 
Friedman test, 634-638 
Kruskal-Wallis test, 631-634 
for one sample, 606-620 
exercises for, 619-620 
paired comparison tests, 617-618 
sign test, 607-611 
Wilcoxon signed rank test, 611-617 
for two samples, 620-630 
exercises for, 629-630 
median test, 620-625, 622t, 624t 
Wilcoxon rank sum test, 625-629 
Nonparametric tests, 599-655 
computer examples for, 642-652 
Minitab, 642-646 
SAS, 648-652 
SPSS, 646-648 
introduction to, 600-601, 601f 
nonparametric confidence interval, 
601-606, 602f 
example for, 603-605 
exercises for, 605-606 
median for, 602-603, 602f 
procedure for finding, for median, 
603b 
nonparametric hypothesis test for 
multiple samples, 630-640 
exercises for, 638-640 
Friedman test, 634-638 
Kruskal-Wallis test, 631-634 
nonparametric hypothesis test for 
one sample, 606-620 
exercises for, 619-620 


paired comparison tests, 
617-618 
sign test, 607-611 
Wilcoxon signed rank test, 611-617 
nonparametric hypothesis test for 
two samples, 620-630 
exercises for, 629-630 
median test, 620-625, 622t, 624t 
Wilcoxon rank sum test, 625-629 
parametric tests v., 733-735 
projects for, 652-655 
randomness test, 653-655 
Wilcoxon tests v. normal 
approximation, 652 
summary for, 640-642, 641t 
Nonsampling errors, 12 
Normal approximation 
to binomial distribution, 213-216, 
214f 
continuity correction for, 214-215, 
214-215b, 215f 
example for, 215-216 
Wilcoxon tests v., 652 
Normal distribution, precision of, 
576-577 
Normal populations 
confidence interval of, 295 
project for, 336 
EM algorithm for, example for, 
677-678 
large sample approximations and, 
212-213 
sampling distributions associated 
with, 191-207 
chi-square distribution, 192-198, 
194f, 195f 
exercises for, 204-207 
F-distribution, 202-204, 202f, 203f 
student t-distribution, 198-201, 
199f, 200f 
Normal probability distribution, 
125-131, 126f, 128f, 129f 
definition of, 125 
estimators and estimates of, 227 
examples for, 126-128 
plotting of, 128-129, 128f, 129f 
SAS example for, 219-220 
Normal probability plot 
for ANOVA, 518-522, 519f, 520f 
for assumption testing, 714-716, 
715f, 716f, 717f 
data transformation and, 717-719, 
718f, 720f 
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example for tying it all together, 
735-743, 737f, 738f, 741f 
for hypothesis testing, 365-366, 366f 
SAS examples for, 48-50 
Normal random variable 
definition of, 104, 125 
examples for, 104-105, 126-128 
mean and variance of, 126b 
mef of, 126b, 191 
Normality 
checking assumptions of, 714-716, 
715f, 716f, 717£ 
of errors, 453 
in hypothesis testing, 364 
test for, 222-223, 517 
Normal-score plot, 222 
construction of, 223b 
Nuisance variables, 468 
Null hypothesis 
ANOVA for, 510 
Bayesian hypothesis testing, 584-588 
definition of, 338 
errors and, 341-342, 341t 
examples for, 340 
testing, 365-366, 366f 
two population means, 376-379 
exercises for, 348-349 
MST and MSE, 505 
necessity of, 340 
p-value and, 362 
example for, 362-363 
sample size and, 346-348 
sign test, 607 
two population means, 375-376, 
375b 
Null subset, 55 
Numerical description, of data, 26-39 
box plots, 33-35 
exercises for, 35-39 
grouped data, 30-33 
Numerical unbiasedness and consistency, 
287 


O 
Observables 
for Bayesian decision theory, 591-592 
examples for, 592-594 
definition of, 591 
predicting future, 596-597 
Observational experiment 
definition of, 468 
designed experiment v., 468 
randomization and, 474 
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Observed frequency, 389 
chi-square tests for, 389-390 
One-factor-at-a-time design, 483-485, 
485f 
definition of, 483-484 
example for, 484, 485f 
One-to-one correspondence, 750 
One-way analysis of variance, 470. See 
also Completely randomized 
design 
Optimal decision 
in Bayesian decision theory, 591 
procedure to find, 591b 
Optimization algorithms, 243 
Order statistics, 207-212 
definition of, 207-208 
distribution of, 209 
example for, 208 
exercises for, 210-212 
joint pdf of, 208, 210 
Ordinal data, 5 
Orthogonal Latin squares. See 
Greco-Latin squares 
Outliers, 708-713 
box plot for, 709 
dealing with, 711-712 
definition of, 708 
detecting, 708-709 
example for, 710-711, 710-711t, 711f 
tying it all together, 735-743, 737f 
exercises for, 712-713 
histogram for, 19-20 
in linear regression models, 462-463 


P 
p-Value, 361-363 
approach to ANOVA, 515-517, 517t 
definition of, 361 
examples for, 362-363 
large sample hypothesis test for, 368b 
reporting test results as, 362 
for sign test, 609 
steps to find, 361b 
Paired comparison tests, 617-618 
Paired t-Test 
Minitab example for, 402-403 
SPSS example for, 405, 407-408 
Pairwise independent, 74 
Parametric tests 
definition of, 600 
nonparametric tests v., 733-735 
Pareto effect, 14 


Pareto graph 

definition of, 14 

example of, 14, 15f 

uses of, 14-15 
Pareto, Vilfredo, 14 
Partition, 75 
Pascal, Blaise, 54 
pdf. See Probability density function 
Pearson, Karl, 291-292 
Permutation, 65, 65b 
Pie chart, 15, 15t, 16f 
Pivotal method 

for confidence interval, 293-298, 

295f, 296f 
example for, 296-298 


procedure for, 293-298, 295f, 296f 


exercises for, 298-300 
for large sample confidence interval, 
300 
Pivotal quantity, sampling distributions 
of, 293-294 
Placebo, 471 
Platokurtic, 99 
pmf. See Probability mass function 
Point estimation, 225-289 
Bayesian, 562-579 
computer examples for, 283-285 
introduction to, 226-227 


jackknife method procedure for, 660b 


example for, 660-661, 660t, 
66lt 
method of maximum likelihood, 
235-246 
exercises for, 244-246 
likelihood function in, 235-236 
maximum likelihood estimators, 
236 
method of moments, 227-235 
definition of, 228b 
exercises for, 233-235 
generalized, 233 
Poisson distribution, 232-233 
for population parameters, 
228-230 
population probability density 
function, 231-232 
procedure for, 228b 
for sample mean and variance, 
230-231 
point estimator properties, 246-282 
consistency, 266-269 
efficiency, 270-277 
exercises for, 262-265, 279-282 


minimal sufficiency and UMVUEs, 
277-279 
sufficiency, 252-262 
unbiased estimators, 247-252 
projects for, 285-289 
alternate method of estimating 
mean and variance, 287 
asymptotic properties, 285-286 
averaged squared errors, 287 
empirical distribution function, 
288-289 
Newton-Raphson in one 
dimension, 288 
numerical unbiasedness and 
consistency, 287 
robust estimation, 286 
statistical inference and, 561 
summary for, 282-283 
Point estimators. See also Estimators; 
Unbiased estimators 
computer examples for, 283-285 
consistency, 266-269, 266f 
definition of, 266 
examples for, 267-269 
exercises for, 279-282 
test for, 267-268, 267b, 268b 
of unbiased estimator, 266b 
uniqueness and, 269 
efficiency, 270-277 
Cramér-Rao inequality, 273b, 
274 
Cramér-Rao procedure to test, 
274b 
definition of, 270 
efficient estimator, 274, 274b 
examples for, 270-272, 274-276 
exercises for, 279-282 
relative, 272-273 
relative test for, 270b 
uniformly minimum variance 
unbiased estimator, 273 
minimal sufficiency and UMVUEs, 
277-279 
definition of, 277 
examples for, 277-279 
exercises for, 279-282 
projects for, 285-289 
alternate method of estimating 
mean and variance, 287 
asymptotic properties, 285-286 
averaged squared errors, 287 
empirical distribution function, 
288-289 


Newton-Raphson in one 
dimension, 288 
numerical unbiasedness and 
consistency, 287 
robust estimation, 286 
sufficiency 
examples for, 252-254, 256-261 
exercises for, 262-265 
jointly sufficient, 257, 258b 
Neyman-Fisher factorization 
criteria, 254-256, 254b 
in point estimation, 252-262 
Rao-Blackwell theorem, 262, 262b 
sufficient statistic and maximum 
likelihood estimators, 260 
verification of, 256b 
summary for, 282-283 
unbiased estimators, 247-252 
definition of, 247 
examples for, 247, 249-251 
exercises for, 262-265 
mean square error, 250 
Rao-Blackwell theorem and, 262 
sample mean as, 247-248 
sample variance as, 248 
Poisson probability distribution, 
119-122 
binomial probability distribution 
and, 121, 121b 
continuous random variable and, 
122 
definition of, 102 
discrete random variable and, 120 
efficiency example for, 275 
example for, 102 
find cdf with, 156 
generating samples from, 181 
maximum likelihood estimators 
with, 238-239 
method of moments and, 232 
recursive calculation of, 182 
Poisson random variable 
definition of, 119 
mean, variance, and mef of, 120, 
120b 
probability and, 120-121 
Poisson, Siméon—Denis, 119 
Political polls, 4 
Population, 4 
Population mean 
in hypothesis testing, 364 
large sample confidence interval and, 
301-302 


small sample hypothesis testing of 
two, 375-376, 375b 
example for, 376-379 
Population moment, method of 
moments for, 228 
Population parameters 
Bayesian inference and, 564 
example for, 564-565, 565t 
confidence interval concerning two, 
321-329 
difference of two means, 321-324, 
321b, 322b 
exercises for, 327-329 
for probability, 325-326, 325b 
for variance, 326-327, 326b 
large sample confidence interval, 
difference of two means, 
321-322, 321b 
method of moments for, 228 
examples for, 228-230 
procedure for, 228b 
small sample confidence interval, 
difference of two means, 322b, 
323-324 
statistical hypothesis and, 338 
Population probability density function, 
method of moments and, 
231-232 
Population variance 
confidence interval for, 315-320, 
316f 
examples for, 317-318 
exercises for, 318-320 
procedure for, 317b 
in hypothesis testing, 364 
Positive transition matrix, 755 
Positively correlated, 441 
Posterior distribution 
in Bayesian point estimation, 
562-563, 566 
example for, 567, 571-576, 574f 
for continuous random variable, 
567-569 
credible intervals and, 580-581, 581f 
definition of, 563 
Posterior median, in Bayesian estimate, 
570 
Posterior odds ratio, 585 
Posterior probability 
Bayesian inference and, 561 
Bayesian point estimation and, 564 
example for, 564-565, 565t 
definition of, 74, 77 
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Power, 349 
Precision, of normal distribution, 
576-577 
Predictor variable. See Independent 
variable 
Prior information, in Bayesian decision 
theory, 589 


Prior odds ratio, 585 
Prior probabilities 
Bayesian inference and, 561 
Bayesian point estimation and, 
562-563, 576-577, 577f 
example for, 564-565, 565t 
definition of, 77 
Probability density function (pdf) 
conditional, 144 
continuous 
definition of, 86 
examples for, 87-90, 87f, 88f, 89f, 
90f 
of F-distribution, 202, 202f 
find with cdf, 155 
joint, 159 
of kth order statistic, 208 
example for, 209 
of log-normal distribution, 129-130 
marginal, 143 
Minitab examples for, 175-176 
random variable functions and, 
156-157 
Student t-distribution and, 198 
Probability distribution. See also 
Binomial probability 
distribution; Conditional 
probability distribution; 
Exponential probability 
distribution; Gamma probability 
distribution; Joint probability 
distributions; Normal probability 
distribution; Poisson probability 
distribution; Standard normal 
probability distribution; Uniform 
probability distribution 
Bayesian point estimation, 574-575 
of correlation, 442 
of order statistic, 209 
of sample statistic, 185 
statistical hypothesis and, 338 
testing to identify, 395-397 
Probability distribution functions, 
114-141 
binomial probability distribution, 
114-119 


Probability distribution functions 
(continued) 
gamma probability distribution, 
131-136, 132f, 134f, 135f 
method of, 154-156 
normal probability distribution, 
125-131, 126f, 128f, 129f 
Poisson probability distribution, 
119-122 
references for, 114 
uniform probability distribution, 
122-125, 122f 
Probability function (pf). See also 
Probability mass function 
of Bernoulli random variable, 115 
binomial distribution, 101 
of univariate random variable, 146 
Probability integral transformation, 
157-158 
definition of, 157 
example for, 157-158 
Probability mass function (pmf) 
discrete 
definition of, 84 
examples for, 85-86, 85f, 86f 
marginal, 143 
Probability plots, for ANOVA, 517-518 
Probability theory, 2 
basic properties of, 58b 
examples for, 58-60, 59t 
chi-square distribution, example for, 
197 
computer examples for, 108-111, 
175-180 
Minitab, 109-110, 175-177 
SAS, 110-111, 178-180 
SPSS, 110, 177 
computing method for, classical 
approach, 56, 56b 
conditional 
definition of, 71 
example for, 72-73 
exercises for, 78-83 
independence, and Bayes’ rule, 
71-83 
properties of, 72b 
counting techniques and calculation 
in, 63-71 
exercises for, 69-71 
definition of, 54 
axiomatic, 57-58, 57b 
classical, 56b 
frequency, 57b 


informal, 55b 
frequency, 57b 
frequency interpretation of, 67 
examples for, 67-69 
in genetics, 73-74 
informal, 55b 
introduction to, 53-54, 114 
joint probability distributions, 
141-154, 145f 
exercises for, 150-154 
law of total, 75b 
example for, 75-76 
laws of, 2 
limit theorems, 163-173 
central limit theorem, 168-171, 
168b 
Chebyshev’'s theorem, 164-166, 
164b 
exercises for, 171-173 
law of large numbers, 166-167b, 
166-168 
moments and moment-generating 
functions, 92-107 
exercises for, 105-107 
skewness and kurtosis, 98-105 
Poisson random variable and, 
120-121 
projects for, 112, 180-182 
random events and, 55-63 
exercises for, 60-63 
in random variable, 84 
random variable functions, 154-163 
exercises for, 161-163 
method of distribution functions, 
154-156, 158 
pdf, 156-157 
probability integral transformation, 
157-158 
transformation method, 159-161 
random variables and probability 
distributions, 83-92 
exercises for, 90-92 
special distribution functions, 
114-141 
binomial probability distribution, 
114-119 
exercises for, 136-141 
gamma probability distribution, 
131-136, 132f, 134f, 135f 
normal probability distribution, 
125-131, 126f, 128f, 129f 
Poisson probability distribution, 
119-122 
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selection of, 136 
uniform probability distribution, 
122-125, 122f 
of Student t-distribution, 199-200, 
200f 
summary for, 107-108, 173-174 
of type I and type II errors, 341 
PROC UNIVARIATE 
examples for, 48-50 
to test for normality, 180 
Proper subset, 748 
Proportion 
hypothesis testing for, 379-380b 
example for, 380 
large sample confidence interval for, 
302-303, 325b 
example for, 303, 325 
Proportion inference, in Bayes inference, 
564 
Proportional stratified sampling, 10, 10t 


Q 
QQ plot. See Quantile quantile plot 
Quadratic loss function, 491, 491f 
for Bayesian estimate, 569-571 
example for, 571-572 
Qualitative data, 5 
Quality control, 4 
Quantile quantile plot (QQ plot), 128, 
128f, 705-706 
example of, 706 
with SAS, 180 
Quantitative data, 5 
Quenouille-Tukey jackknife. See 
Jackknife method 


R 


R, for statistics analysis, 39 
Random events, probability and, 55-63 
Random experiment, 55 
Random process, 751 
Random sample 
definition of, 184 
maximum likelihood estimators 
with, 239-240, 240f 
in MCMC, 683 
median test for large, 622-623b 
example for, 623-624, 623t, 624t 
obtaining from different 
distributions, 221-222 
sample mean of, as unbiased 
estimator, 247-248 
example for, 249 


sign test for large, 610b 
example for, 610-611 
sufficient estimators and, 253-254 
Wilcoxon rank sum test for large, 
627-628b 
example for, 628-629, 628t 
Random variables. See also Continuous 
random variable; Discrete 
random variable 
Bernoulli 
law of large numbers for, 167 
method of moments and, 228-229 
mixture distribution and, 180-181 
probability function of, 115 
sufficient estimators and, 252-253 
binomial 
definition of, 101 
examples for, 116-118 
expected value of, 118-119, 118b 
mean of, 101-102, 118-119, 118b 
moment-generating function of, 
118-119, 118b 
SAS examples for, 178-180 
variance of, 118-119, 118b 
binomial probability distribution of, 
101 
Chebyshev’s theorem and, 165 
chi-square 
degrees of freedom of, 193 
F-distribution and, 202 
from gamma random variable, 193 
mean, variance, and mgf of, 136b 
from standard normal random 
variable, 193, 194f 
conditional probability distribution 
of, 144 
continuous 
cumulative distribution function, 
86-90, 87b, 87f, 88f, 89f, 90f 
definition of, 86 
expected value of, 93-94, 94f 
probability density function for, 
86-90, 87f, 88f, 89f, 90f 
counting, 119 
definition of, 83 
discrete 
cumulative distribution function 
for, 84 
definition of, 84 
example for, 84-85 
expected value of, 92-94, 94f, 96 
probability mass function for, 84 
uniform distribution of, 96 


examples for, 83 
exercises for, 90-92 
expectation of function of, 95b 
exponential 
definition of, 133 
mean, variance, and mef of, 134b 
as a function, 85, 85f 
functions of, 154-163 
exercises for, 161-163 
method of distribution functions, 
154-156, 158 
pdf, 156-157 
probability integral transformation, 
157-158 
transformation method, 159-161 
gamma 
chi-square random variable from, 
193 
mean, variance, and megf of, 132b 
independent 
distribution of, 160-161, 161f 
examples for, 144-146, 145f 
pdf and, 144 
in Student ¢-distribution, 200 
with joint probability function, 142 
kth moment about the mean, 99 
kth moment about the origin of, 99 
Minitab examples for, 109-110 
moment-generating function of, 
100-105 
normal 
definition of, 104, 125 
examples for, 104-105, 126-128 
mean and variance of, 126b 
mef of, 126b, 191 
Poisson 
definition of, 119 
mean, variance, and mef of, 120, 
120b 
probability and, 120-121 
Poisson distribution, 102 
probability in, 84 
in sample, 184 
simulation 
with exponential probability 
distribution, 221 
with uniform probability 
distribution, 221-222 
standard deviation of, 95 
standard normal 
chi-square random variable from, 
193, 194f 
definition of, 103 
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example for, 103-104 
in sampling distribution, 192 
statistical hypothesis and, 338 
uniform, mean, variance and mef of, 
123b, 124 
univariate, probability function of, 
146 
variance of 
definition, 95 
examples, 96 
Random walk chain, 753-754 
Randomization 
definition of, 472 
example for, 473-474, 473t 
procedure for, 472-473b 
in randomized complete block 
design, 474-475b 
Randomized complete block design 
ANOVA, 526-535, 528t, 529t 
computational procedure for, 
530-531b 
decomposition of, 527-529 
example for, 532-533 
exercises for, 534-535 
definition of, 474 
examples for, 475 
Minitab example for, 545-546 
procedure for, 474-475b 
with replications 
determining minimum number of, 
476-477 
examples for, 476 
procedure for, 475-476b 
Randomness test, 653-655 
example for, 655 
Minitab examples for, 45-46, 
654-655 
procedure for, 654b 
Wald-Wolfowitz test as, 517, 653 
Random-walk Metropolis, 688 
Rao, Calyampudi Radhakrishna, 
225-226 
Rao-Blackwell theorem, 262, 262b 
Recessive trait, 73 
Recurrent state, 755 
Recursive calculation, of binomial and 
Poisson probabilities, 182 
Regression analysis, 412 
procedure for, 412b 
quality of, 421-422, 421f, 422 
use of, 412 
Regression coefficients 
confidence interval for, 429b 
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Regression coefficients (continued) 
example for, 430-431 
hypothesis testing for, 431b, 432b 
example for, 432-433 
Regression diagnostics, 451-453 
error independence, 453 
homoscedasticity, 452, 452f 
linearity, 452 
normality of errors, 453 
Regression models 
correlation analysis in, 440 
examples for, 739-743, 742f 
procedure for, 412b 
Relative efficiency 
definition of, 272 
example for, 272-273 
Relative frequency 
definition of, 17 
example of, 17-18 
Relatively more efficient 
definition of, 270 
procedure to test for, 270b 
Replacement 
sampling with 
objects not ordered, 66 
objects ordered, 64-65 
sampling without 
objects not ordered, 65-66 
objects ordered, 65 
Replication, 471 
Representative sample, 8 
Response variable 
definition of, 467 
examples for, 469-470 
Robust design, 489 
Robust estimation, 286 
Robustness, ANOVA and, 518 
Robustness statistics, 492-493, 493t 
Rules of decision, 338 
Run test, with Minitab, 45-46 


Ss 

Sample, 184 

Sample data 
definition of, 4 
errors in, 11-12 
size of, 12 

Sample mean (SM), 185 
in ANOVA, 511-512, 512f 
consistency of, 267 
distribution of, 185 
efficiency of, 270-272 
example for, 186-188 


hypothesis testing with, 367 
large sample approximations and, 
212-213 
method of moments for, 230-231 
of random sample as unbiased 
estimator, example for, 249 
theorem for, 186 
Sample median, 185 
example for, 208 
Sample moment, method of moments 
for, 228 
Sample point, 55 
Sample size 
definition of, 184 
hypothesis testing and, 346-348 
large sample approximations and, 
212-213 
large sample confidence interval and, 
303-306 
examples for, 305-306 
in optimal experimental design, 
487-489 
Sample space, 55 
Sample standard deviation 
ANOVA and, 518 
in hypothesis testing, 364 
hypothesis testing with, 367 
Sample statistic, 185 
Sample variance, 185 
consistency of, 268-269 
example for, 186-188 
with chi-square distribution, 197 
expected value of, 188-189 
theorem for, 186 
as unbiased estimator, 248 
Sampling 
with replacement 
objects not ordered, 66 
objects ordered, 64-65 
without replacement 
objects not ordered, 65-66 
objects ordered, 65 
Sampling distributions, 183-223 
associated with normal populations, 
191-207 
chi-square distribution, 192-198, 
194f, 195f 
exercises for, 204-207 
F-distribution, 202-204, 202f, 203f 
student t-distribution, 198-201, 
199f, 200f 
bootstrap methods for, 663 
computer examples for, 219-221 


Minitab, 219 
SAS, 219-221 
SPSS, 219 
confidence interval based on, 334 
definition of, 184-185 
exercises for, 189-191 
finite population, 187-189 
introduction to, 184-191 
large sample approximations, 
212-218 
exercises for, 216-218 
normal approximation to binomial 
distribution, 213-216, 214f, 215f 
order statistics, 207-212 
exercises for, 210-212 
of pivotal quantity, 293-294 
power and, 350 
projects for, 221-223 
simulating random variables, 
221-222 
simulation experiments, 222 
test for normality, 222-223 
statistical inference and, 560 
summary for, 218 
Sampling errors, 12 
Sampling schemes, 8-12 
SAS 
ANOVA examples, 548-554 
completely randomized design, 
548-549 
Tukey's method, 549-554 
commands for, 47-48, 50 
descriptive statistics examples, 47-50 
empirical method examples, 
698-699 
experimental design examples, 
494-497 
general format of program in, 47b 
hypothesis testing examples, 
405-408 
interval estimation examples, 333 
linear regression model examples, 
458-461 
nonparametric tests examples, 
648-652 
Kruskal-Wallis test, 650-652 
Wilcoxon rank sum test, 648-649 
probability theory examples, 110-111, 
178-180 
references for, 50 
sampling distribution examples, 
219-221 
for statistics analysis, 39 


Savage, Leonard Jimmie, 588 
Scale parameter, 131 
Scatter diagram, for linear regression 
model, 413, 413f 
Scatterplots, 704, 704f 
for bivariate data model building, 
730-732, 731f 
for checking adequacy, 461 
example for, 704-705, 705f 
tying it all together, 735-7343, 
739f, 742£ 
Scheffe’s method, 536 
Sequential experimental design, 487 
Set 
definition of, 747 
operations of, 748-750 
complement, 749, 749f 
difference, 749 
intersection, 748, 749f 
union, 748, 748f 
properties of, 749-750b 
Set theory, 747-750 
set definition, 747 
set operations, 748-750 
set properties, 749-750b 
Shape parameter, 131 
Side-by-side box plots, 704 
for variance test for equality, 
722 
Sign test, 607-611 
application of, 608b 
for large random sample, 610b 
example for, 610-611 
Minitab example for, 642 
paired comparisons, 617-618 
procedure for, 608b 
p-value method and, 609 
Wilcoxon signed rank test v., 611 
Signal-to-noise ratio, Taguchi methods 
and, 489 
Significance tests, bootstrap method and, 
663 
Simple linear regression model, 
413-428, 413f, 414f 
definition of, 414, 414f 
derivation of estimators, 416-421, 
420f 
estimation of error variance, 425 
exercises for, 425-428 
least-squares estimator properties, 
422-425 
method of least squares, 415-416, 
415f 


quality of regression, 421-422, 421f, 
422f 
Simple random sample 
advantages of, 8b 
definition of, 8 
effectiveness of, 9 
example for, 8 
Simple regression line, 420, 420f 
Simulation experiments, 222 
Simultaneous experimental design, 487 
Single-factor experiments 
definition of, 469 
example for, 469 
Size, of sample data, 4, 12 
Skewness, 98-105 
definition of, 99 
with histogram, 19-20 
SM. See Sample mean 
Small sample confidence intervals, 
310-315 
for difference of two means, 322b, 
323-324 
examples for, 311-312 
exercises for, 313-315 
Minitab examples for, 330-331 
procedure for, 310-311b, 311 
simulation of coverage of, 334 
Small sample hypothesis testing, 
364-365b 
example for, 365-366, 366f 
population means, 375-376, 375b 
example for, 376-379 
Smith-Satterthwaite procedure, 376 
Splus, for statistics analysis, 39 
Spread of data, with histogram, 19-20 
SPSS 
ANOVA examples, 538-541, 546-547 
completely randomized design, 
546-547 
Tukey’s method, 547 
descriptive statistics examples, 46-47 
histogram, 46 
stem-and-leaf, 46 
hypothesis testing examples, 
403-405 
interval estimation examples, 332 
linear regression model examples, 
457-458 
nonparametric tests examples, 
646-648 
Kruskal-Wallis test, 647-648 
Wilcoxon rank sum test, 646-647 
probability theory examples, 110, 177 
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sampling distribution examples, 219 
for statistics analysis, 39 
Square root transformations, for ANOVA, 
555 
Squared error loss function. See 
Quadratic loss function 
SS. See Sum of squares 
SSB. See Sum of squares of blocks 
SSE. See Sum of squares for errors 
SSR. See Sum of squares of regression 
SST. See Sum of squares for treatment 
Standard deviation 
definition of, 26 
of discrete random variables 
definition of, 95 
examples, 96-98 
example for, 28 
Minitab examples for, 43-44 
SPSS examples for, 46-47 
statistical inference and, 561 
Standard error 
bootstrap method and, 663, 
665-666, 665b, 666-667b 
example for, 666-667 
definition of, 186, 665 
Standard normal probability distribution 
CLT and, 169 
Minitab examples for, 219 
Student t-distribution and, 199 
Standard normal random variable 
chi-square random variable from, 
193, 194f 
definition of, 103 
example for, 103-104 
in sampling distribution, 192 
State space, 751 
States of nature, 77 
Statistic 
definition of, 185 
sufficiency of, 252 
Statistical applications 
checking assumptions, 713-727 
ANOVA, 713 
data transformations, 716-719 
exercises for, 724-727 
normality, 714-716, 715f, 716f, 717f 
test of independence, 724 
t-test, 713 
variance equality, 719-724 
conclusion, 746 
graphical methods, 702-708, 704f, 
706f 
bar graph, 13-14, 13t, 14f 


Statistical applications (continued) 
box plots, 704 
dotplot, 703 
exercises for, 20-26, 707-708 
Pareto graph, 14, 15f 
pie chart, 15, 15t, 16f 
quantile quantile plot, 705-706 
scatterplot, 704-705, 704f, 705f 
stem-and-leaf plot, 16-17, 16t 
introduction to, 702 
modeling issues, 727-733 
bivariate data, 730-732, 730f, 731f, 
732£ 
exercises for, 732-733 
simple model for univariate data, 
727-729, 729f 
outliers, 708-713 
box plot for, 709 
dealing with, 711-712 
definition of, 708 
detecting, 708-709 
example for, 710-711, 710-711t, 
TUE 
exercises for, 712-713 
parametric v. nonparametric analysis, 
733-735 
tying it all together, 735-746 
exercises for, 743-746 
Statistical concepts, 2 
Statistical decisions, 338 
Bayesian decision theory v., 588-589 
Statistical hypothesis 
definition of, 338 
elements of, 339b 
Statistical inference 
Bayesian inference v., 560 
definition of, 5 
Statistical methods 
definition of, 2 
uses of, 2-3 
Statistical software, 39-40 
Statistics 
central limit theorem in, 171 
Chebyshev’s theorem for, 165 
computers and, 39-40 
in decision making, 338 
definition of, 2-3 
in genetics, 73-74 
StatXact, for statistics analysis, 39 
Steady state, 756 
Stem-and-leaf plot 
definition of, 16 
example of, 16, 16t 


Minitab examples for, 42-43 
SAS examples for, 48-50 
SPSS examples for, 46 
use of, 17 
Stochastic matrix, 752 
Stochastic process, 751 
Stratified sampling 
definition of, 9 
examples for, 10, 10t 
steps for selecting, 9-10b 
uses of, 11b 
Student t-distribution, 198-201, 199f, 
200f 
definition of, 198 
examples for, 201 
exercises for, 204-207 
graphical behavior of, 199, 199f 
regression analysis and, 434 
Studentized range distribution, 536-537 
Subjective probability, Bayesian inference 
and, 560-561 
Subset 
definition of, 747 
proper, 748 
Sufficiency 
examples for 
Bernoulli random variables, 
252-253 
factorization theorem, 259-260 
jointly sufficient, 258-259 
mean, 256-259 
minimal, 277-279 
random sample, 253-254 
sufficient statistic and maximum 
likelihood estimators, 261 
exercises for, 262-265 
jointly sufficient 
definition of, 257 
factorization criteria for, 258b 
minimal, 277-279 
Minitab example for, 283-284 
Neyman-Fisher factorization criteria, 
254-256, 254b 
in point estimation, 252-262 
Rao-Blackwell theorem, 262, 
262b 
sufficient statistic and maximum 
likelihood estimators, 260 
verification of, 256b 
Sufficient estimator, 252 
Sufficient statistic 
definition of, 252 
for discrete distribution, 259-260 
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maximum likelihood estimators and, 
260 
example for, 261 
Sum of squares (SS), ANOVA and, 
502-503 
Sum of squares for errors (SSE) 
ANOVA 
completely randomized design and, 
510-512, 518-522 
randomized complete block design 
and, 526-535 
for two treatments, 502-504 
calculation for, 420-421 
definition of, 416, 503 
independence of, 504 
least-squares estimators and, 
428-429 
for multiple linear regression model, 
446-447 
regression analysis and, 434 
Sum of squares for treatment (SST) 
ANOVA 
completely randomized design and, 
502-503, 510-512, 518-522 
randomized complete block design 
and, 526-535 
for two treatments, 502-504 
definition of, 503 
independence of, 504 
regression analysis and, 434 
Sum of squares of blocks (SSB), ANOVA 
and randomized complete block 
design, 526-535 
Sum of squares of regression (SSR), 
regression analysis and, 434 
Survival times, EM algorithm for, 
example for, 673-676 
Symmetric difference, of set, 749 
Systematic sample, 9, 9b 


T 


Taguchi, Genichi, 465-466, 489 
Taguchi loss function, 489-492, 490f 
bias and variance in, 491-492 
quadratic, 491, 491f 
Taguchi methods, 489-493, 490f, 491f 
exercises for, 492-493 
t-Distribution. See Student t-distribution 
Temperature, spoilage of milk and, 497 
Test for normality, 222-223 
with SAS, 180 
Tests of hypothesis, 338 
Tests of significance, 338 


Time series data 
definition of, 6 
example for, 6, 7t 
Total probability 
example for, 77-78 
law of, 75b 
Total SS. See Total sum of squares 
Total sum of squares (Total SS) 
ANOVA and completely randomized 
design, 502-503, 510-513 
example for, 518-522 
ANOVA and randomized complete 
block design, 528-535 
decomposition of, 510-511, 512f 
Transformation method, 159-161 
definition of, 159 
Transformations 
for ANOVA, 554-555 
checking assumptions of, 716-719 
example for, 717-719, 718f, 719f, 
720f 
Transient state, 755 
Transition matrix, 752 
examples for, 752-755 
positive, 755 
Transition probabilities, 751 
Treatment variable 
definition of, 467-468 
examples for, 469-470 
Tree diagram, 64, 64f 
Trial, 55 
Trimmed mean, example of, 29 
t-Test 
ANVOA v., 501, 506-508, 536 
assumptions of, 713 
Minitab example for, 400 
sign test v., 607 
SPSS example for, 406-407 
Wilcoxon rank sum test v., 625 
Wilcoxon signed rank test v., 
613-615, 614f 
Tukey, John Wilder, 499-500 
Tukey's method, 536 
example for, 538-541 
implementation of, 537b, 538t 
Minitab example for, 546 
SAS example, 549-554 
SPSS example, 547 
Two independent sample test, matched 
Pairs test v., 384-385 
Two-way analysis of variance, 470. See 
also Randomized complete block 
design 


Type I error 
Bayesian hypothesis testing, 584-588 
definition of, 341, 341t 
examples for, 342-344 
exercises for, 348-349 
sample size and, 346-348 
Type U error 
Bayesian hypothesis testing, 584-588 
calculation of, 345b 
definition of, 341, 341t 
examples for, 342-346 
exercises for, 348-349 
sample size and, 346-348 


U 
Ulam, Stanislaw, 657-658 
UMP tests. See Uniformly most powerful 
tests 
UMVUE. See Uniformly minimum 
variance unbiased estimator 
Unbiased estimators, 247-252 
consistency of, 266b 
definition of, 247 
examples for 
Bernoulli population, 247 
calculation of, 249-250 
method of moments, 250 
proof of, 251 
sample mean as, 249 
uniqueness of, 249 
exercises for, 262-265 
mean square error, 250 
Minitab example for, 283-284 
Rao-Blackwell theorem and, 262 
sample mean as, 247-248 
sample variance as, 248 
Uniform probability distribution, 
122-125, 122f 
definition of, 122 
of discrete random variable, 96 
examples for, 123-125 
likelihood function for, 24f, 240 
mean, variance and mef of uniform 
random variable, 123b, 124 
random variable simulation with, 
221-222 
Uniform random variable, mean, 
variance and megf of, 123b, 124 
Uniformly minimum variance unbiased 
estimator (UMVUE), 277-279 
definition of, 273, 279 
examples for, 277-279 


Index 823 


Uniformly most powerful (UMP) tests, 
for composite hypotheses, 
355-356 

Union, 748, 748f 

Univariate data, simple model for, 
727-729 

Univariate random variable, probability 
function of, 146 

Universal set, 747 

Upper quartile 

definition of, 27 
example for, 28-29 
Utility, in Bayesian decision theory, 589 


V 


Variables. See specific variables 
Variance 
alternate method of estimating, 287 
Bayesian point estimation, 575-576 
of binomial random variable, 
101-102, 118-119, 118b 
of chi-square distribution, 192 
of chi-square random variables, 136b 
confidence interval for, 326-327, 
326b 
definition of, 26 
grouped, 30 
of discrete random variables 
definition of, 95 
examples of, 96-98 
examples for, 28 
in experimental design, 470-471 
of exponential random variables, 
134b 
of gamma random variable, 132b 
hypothesis test for, 368-369b 
equality of, 380-382, 381b 
jackknife method for, 659 
large sample confidence interval and, 
302 
of least-squares estimator, 424 
in loss function, 491-492 
with median test, 621 
method of moments for, 230-231 
in MSE, 250-251 
of normal random variable, 126b 
of Poisson random variable, 120, 
120b 
properties of, 95b 
sample, 185 
SPSS examples for, 46-47 
of Student t-distribution, 199-200 
test of equality of, 719-724 


Variance (continued) 
for more than two normal 
populations, 722-724 
for two normal populations, 
719-722 
of uniform random variable, 123b, 
124 
Venn diagram, 748, 748f 


Ww 

Wald, Abraham, 588 

Wald-Wolfowitz test, for testing 
randomness assumption, 517 

Wilcoxon rank sum test, 625-629 


distribution of, 627 

example for, 626-627, 626t, 627t 

for large samples, 627-628b 

example for, 628-629, 628t 

normal approximation v., 652 

procedure for, 625-626b 

rejection regions, 626 

SAS example for, 648-649 

SPSS example for, 646-647 
Wilcoxon signed rank test, 611-617 

examples for, 612-613, 613t, 614t 

hypothesis testing procedure by, 

611-612b 
for large samples, 615b 
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example for, 616-617, 616-617t 

Minitab example for, 643 
normal approximation v., 652 
paired comparisons, 617-618 
sign test v., 611 
t-test v., 613-615, 614f 
usefulness of, 617 

Wolfowitz, Jacob, 599-600 


Z 


z-Test 
for outliers, 709 
SAS example for, 407 


